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1* Introduction 

Hus report deals with considerations of the statistical aspect! 
of carbon fiber risk assessment modeling for fire accidents involving 
commercial aircraft. There are numerous advantages to using carbon 
fiber materials on aircraft; strength and weight reduction are two such 
examples. However, should an aircraft be involved in a fire accident, 
the possibility exists for a release of free carbon fibers to the atmo- 
sphere with a potential effect of some of these fibers infiltrating 
and shorting out electrical and electronic "machinery". 

The ultimate goal of the entire carbon fiber risk assessment pro 
gram was to determine risk profiles for this phenomenon; that is, curve, 
of potential damage values and their associated probabilities. Very 
comprehensive reviews of the factors affecting such profiles and of the 
entire risk assessment program itself are given by Huston (1979, 1980). 

The present study focuses on the statistical aspects influencing the 
development of the risk profiles. 

The next section of the report presents an overview of the sta- 
tistical problems encountered in producing risk profiles and identifies 
t e major sources of uncertainty. Sections 3, 4, and 5 treat each of 
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these major uncertainty sources in detail, namely Section 3 dealt . with 
imprecise knowledge in establishing the model. Section 4 treats th 
problems associated with model parameter estimation and Sect on 
centrates on sampling errors in the Monte Carlo simulation ana sis 
and obtaining confidence bounds on the results. Finally, 
provides a general framework for building in and obtaining conserva 

tism in risk profile generation. 

2 . Treatment of Errors and Uncertainty 
±n r.F Risk Analysis 

In assessing CF-related damage due to accidents of commercial 
aircraft, various uncertainties must be dealt with. The cost Incurred 
"I! . incident will vary according to the circumstances surrounding 
the incident, and the resulting uncertainty in the cost incur- “ 
therefore described by a risk profile. A similar risk profil 
to describe the uncertainty associated with the total cost incurred 
a year from CF incidents. The goal of risk analysis is to determ 
these profiles which reflect the inherent randomness of actual physical 

phenomena, 

in the pursuit of estimating these risk profiles, additional 
uncertainties (potential sources of error) are encountered. These can 
be classified into three general categories: (1) imperfections in^ h 

mathematical model of the physical Phenomena, (2) i-act -- 
Of the numerical and guantitative aspects of the model, (3) states 
error from simulation sampling. A careful modeling effort and a sta- 
tistically sound methodology can help to control these uncertaintie , 
thereby preventing them from contributing to misleading conclusions 
about the risk profiles. In the CF analyses performed here, these 
factors have generally been controlled or else dealt with conservative y, 
resulting in conservative estimates of the risk profiles. 

We will now elaborate on each of the three sources of uncer- 
tainty just mentioned and on how they were dealt with in the CF risk 

analysis. 


- 2 - 



T-419 


(1) Imperfect model . By its very nature, the mathematical model 
will be an approximate description of the physical phenomena. This re- 
sults from imprecise knowledge of the phenomena, the need for tract- 
ability, and factors that may have been overlooked. For example, in the 
CF model considerable aggregation is necessary, but this is performed 
cautiously by using conservative numerical quantities for entire classes 
of equipment, buildings, aircraft, etc. Deterministic cost values were 
used instead of distributions of costs, but it can be shown that the 
n law of averages'* implies minimal error propagation due to this modeling 
simplification. Many secondary economic effects (lost production, clean- 
up, etc.) were included and their costs conservatively estimated; for 
example, the modeling approach of ORI allowed the entire gross domestic 
product of an area to be lost. In general, a modeling philosophy of 
"reasoned conservatism" was followed. 

(2) Numerical inputs for the model . The model of CF risk requires 
many numerical and other quantitative inputs; among these are accident 
rates, positions of vulnerable equipment, weather frequencies, air traffic 
projections, building transfer coefficients, failure probability distribu- 
tions, etc. Most of these quantities must be estimated from data and then 
projected to the year 1993, which leads to only approximate values of 
these quantities for use in the model. A conservative approach was gener- 
ally taken. For example, for factors such as CF release and building 
penetration, conservatively high values were used. For many other factors, 
sensitivity analyses showed that imprecise knowledge of their true values 
had minor effects on the final estimate of the risk profiles. One sensi- 
tive parameter, however, is the equipment failure distribution. Equipment 
failure data and theoretical considerations showed that exponential fail- 
ure laws were appropriate in many cases and conservative in others. 

Thus the use of exponential failure laws in the model leads to conserva- 
tive results. Further, use of vulnerable 1979-vintage equipment results 
in conservative approximations to 1993 electronic equipment. 

(3) Simulation sampling error. The risk model is exercised by 
simulation. This is equivalent to drawing a statistical sample from a 

- 3 - 
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population, and attributes of the population are then estimated from the 
sample. When the computer simulation sampling is done properly, it is 
possible to use modifications of classical statistical techniques to 
generate accurate bounds on the sampling error, which permits the estab- 
lishment of confidence bounds on the risk profile. 

To summarize: by performing careful analyses it is possible to 

exert control over the effects of various sources of the added uncer- 
tainty (or error) which can influence the estimate of the risk profiles. 
While the statistical aspects of model building and simulation interpre- 
tation are not well developed enough to give precise conclusions on the 
total error in the final answers, it is possible to analyze the errors 
individually due to the separate sources and to control them. Figures 
2-1, 2-2, and 2-3 illustrate this process. The interaction of the many 
sources is a problem, but a fairly strong degree of confidence in the 
conservatism of the final conclusion arises from the conservative and 
statistically sound approaches taken to control the individual error 
sources. 

3. Treatment of Imprecise Knowledge 
in Model Conception 

Any model is an abstraction of and approximation to the real 

world. In constructing a model, often a trade-off is necessary be- 
tween "realism" and tractability . Further, judgements are often re- 
quired in making certain assumptions concerning how factors behave and 
relate. 

This section deals with two such examples, namely the use of 
deterministic values in place of random variables in order to gain model- 
ing efficiency, and the choice of an appropriate probability distribu- 
tion for representing equipment failure. 


- 4 - 
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RISK ASSESSMENT PROCEDURE 



Figure 2-1. — Risk -profile methodology. 
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RISK ASSESSMENT PROCEDURE 



Figuve 2-2 — Potential errors affecting the risk 
profile methodology . 
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RISK ASSESSMENT PROCEDURE 
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Figure 2-3 — Approaah.es to handling errors in 
risk profile methodology. 
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3. 1 Use of expected or deterministic values 
in place of random variables 

The simulation analysis of CF damage and its cost is based to a 
large extent upon various averages, but the goal is to estimate (with 
confidence) risk probabilities. This section focuses on the effect that 
variation of input data which are assumed constant can have on the sim- 
ulation conclusions. 

Average values have been used instead of random ones for repair 
and downtime costs in the current simulation modeling effort. Consider- 
ing the categories and types of repair and disruption costs, it seems 
that each would tend to be quite variable. A coefficient of variation 
(a/p) equal to 2 or 3 would not be unreasonable for most of these costs. 
The question is: What is the effect of using variable costs in inputs 

instead of expected costs? 

The possible magnitude of such a change in the model can be seen 
for the variance of the national conditional risk profile by considering 
the following analysis of a much simpler problem. Let X equal the to- 
tal economic loss or cost, given an accident at some particular airport. 
Roughly speaking, X equals the sum of a large number of individual 
costs . . . ,L n , where N is random: 

N 

X = I L . 
i=l 

Now consider the distribution of X , in particular its mean and vari 
ance. Making the simplification for illustrative purposes that the L ± 's 

are independent and have common mean y^ and standard deviation 0^ , 
we have 

E [X] = E[N]y L 

Var [X] = E[X 2 ] - (E[X]) 2 , 

■ E [( L i ) ] - < E[nl \> 2 

- 8 - 
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= E 


[1 1 ‘ * U ■*] - 


(E[N]) 2 y L 2 


E[N]E[L 2 ] + l ((n 2 -n) (E[L] ) 2 P(N=n) - (E[N]) 2 y 2 
n=0 


= E[N]E[L 2 ] + (E[N 2 ]-E[N]) (E[L]) 2 - (E[N]) 2 y 2 

- E[N] [ E[L 2 ]- (E[L]) 2 ] + (E[L]) 2 [ E[N 2 ]-(E[N]) 2 ] 
= E[N] (a 2 ) + y 2 a 2 . 


Therefore, 


Now consider 


Var[Xl - + l£>J 


(3.1) 


N 


X’ = l E[L] = N E[L] = Nn 
i=l L 


which is analogous to computing risk by assuming the costs are fixed at 
their expected values; 


E [X 1 ] = E[N]y L , 
Var [X* ] = cr 2 y 2 . 


From the accidental nature of the process generating the cost it is not 
unreasonable to assume that N has a Poisson distribution. Then 



Suppose the coefficient of variation of the cost distribution equals k , 
that is, . 


a i>L = k » 


or 

Then 

and hence 


a L = ky L . 

Var [X] = (l+k 2 )y N y 2 , 


Var[X] = (1+k )Var[X'] . 


- 9 - 
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Thus if the individual costs have coefficient of variation k, the true 
variation of risk is 1+k 2 times the value obtained using expected costs. 

The above is not meant to be a precise analysis for the actual 
system under consideration. However, the two are close enough that the 
above analysis may roughly apply. Namely, if there is much variation in 
costs, it can have a pronounced effect on the variation of the total 
risk. It is not unreasonable to believe the costs could have a coeffi- 
cient of variation equal to 2 or 3, and in this case the standard devia- 
tion of X will be approximately 2.24 or 3.16, respectively, times as 
great as that of X' . 

We have seen that in adding up a random number of randomly dis- 
tributed costs, if the costs are assumed to be fixed and nonrandom, a 

term is ignored, and that this can be serious if y N is of the 

O 

same order of magnitude as and CT^/y^^l • If N is Poisson this 

is true. To get some idea of what the assumption of fixed costs would 
mean in that case, risk curves for two cases are plotted in Figure 3-1. 
one for fixed costs of $1 per failure and one for a distribution of costs 
for each failure, $0 and $2 being equally likely. A normal approxima- 
tion to the Poisson is used. 

Note that, in a sense, the severity of the error by assuming 
fixed costs depends on how the graphs are used. There are two quantities 
that can be read off these graphs: tail probabilities and tail 
percentiles . 

Tails probabilities : Suppose one is interested in the probability 

of the damage exceeding $140. The correct answer is .0024, the incorrect 
answer is .000033, almost two orders of magnitude too optimistic — a 
rather bad error. 

Tail percentiles: Suppose one is interested in the 99.99th per- 

centile. The correct answer is $153. The incorrect answer is $137 — not 
a very great error. 


- 10 
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Thus even though the correct variance is double the incorrect var 
iance , for certain purposes it may not be too serious a mistake. 


The reason we can obtain a good estimate of percentiles from the 
incorrect curve is that the slopes are steep. So the question arises: 
What happens when the slope of the curve based upon fixed costs is not 
steep? Such will only arise for a risk distribution with a heavy tail 
which implies a high coefficient of variation o ^/ Pjg > for illustrative 

purposes let's suppose o ^/ = 3 . However, another factor now must be 
considered, namely, the magnitudes of and • Reconsider Equa- 

tion (3.1), namely, 

Var [ X] - + 1^4 • 

If we can assume the total cost incurred during a year equals the sum of 
many small costs (perhaps, on the average, 100 such costs), and further 
assume that they are roughly independently and identically distributed, 
then in (3.1) = 100 and a N = 300 . Furthermore, if we assume a 

worst cast of o / = 10 , then the summands of (3.1) satisfy 


vM 


m l n 


Thus using fixed costs ignores approximately 10% of the variation. 


The tentative conclusion is this: If one can assume that the total 

cost is the sum of many small costs and the number of these small costs 
incurred is distributed with o ^/ ^ > 1 , then one can get a fairly good 

estimate of the variance by using the average costs rather than the dis- 
tribution of costs. We believe this could be made precise in a manner 
analogous to proofs of the central limit theorem for sums of indepen- 
dent but not identically distributed random variables. However, the 
above possibilities hold only when the true mean costs are used. If 
these are not known exactly, another source of variation is introduced. 


- 12 - 
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3. 2 A general model of electronic 
equipment failure 

We examine a simple, but fairly general, stochastic model of the 
process by which a piece of electronic equipment (hereinafter called the 
"item") may fail to function properly because of exposure to carbon 
fibers (CF). The model assumes the item to be composed of n indepen- 
dent subsystems or "circuits," each of which receives the same amount 
of exposure E to CF, thereby causing it to receive "shocks" or "hits" 
according to a Poisson stochastic process. Circuit j is assumed to 
fail after receiving r^ shocks, j=l,...,n , and the item is assumed 

to fail when s circuits have failed; only the cases s=l and s=2 
are examined in detail. Also, we only examine closely the cases in 

which all r. are the same. 

J 

For the various cases considered we compute F(E) , the probability 
that the item fails due to the exposure E . The algebraic form of F(E) 
indicates the probability distribution of the exposure level at which 
the item fails; for certain combinations of the parameters n,s and r^ 

this distribution is either exponential or Erlang-r (for r > 1) . For 
other combinations of the parameters the probability distribution is not 
one of the well-known types. 

For all cases considered we examine the asymptotic behavior of F(E) 
as E decreases to zero. It is found that F(E) is approximately equal 

to cE^ for small values of E , where the constant c and the positive 
integer q depend on the particular case as well as the parameters. An 
exponential failure distribution yields q=l while an Erlang-r failure 
distribution yields q=r . 

Our conclusions are: (1) many different failure distributions 

arise from different choices of the model parameters; (2) examination 
of F(E) for small values of E is not a good guide for inferring the 
probability distribution of failure of the item. 


13 - 
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3.2.1 The general model 

The model is based on the following assumptions. The item consists 
of n independent circuits, each subjected to an exposure E . Circuit 
j (j=l n) is characterized by the parameter a^ , where l/a^ is 

the average amount of incremental exposure between consecutive shocks of 
circuit j . More precisely, we assume that for every value of E in 
the interval (0,<») , the probability distribution of the number of shocks 
of circuit j during the period of time during which the exposure has 
accumulated to a value of E is the Poisson distribution given by 


“ a 1 E X 1 

P.(x ) = e J (a.E) J /x.! , x.=0,l,2,... 

J J 3 3 3 


(3.2) 


-a E 

In particular, P^ (0) = e J 


It is assumed that circuit j has failed due to the exposure E 
if it has received r^ or more shocks, where r^ (assumed given) has one 

of the values 1,2,... . Let F^(E) be the probability that circuit j 

fails due to the exposure E . Then 


F (E) = 


l yv - 1 - l 

,=*r. J J x =0 


r j -1 -a.E 


e J (a.E) J /x, I 


It is assumed that the item has failed due to the exposure E if 
s or more circuits have failed, where s (assumed given) has one of the 
values l,2,.,.,n . Let F(E) be the probability that the item fails 
due to the exposure E . We will initially write expressions for F(E) 
for three fairly general cases and then analyze various cases in detail. 

For s=l we have 


r .-I 

n j -a.E 


1 - F(E) = n fl-F (E)] = n y e j (a.E) j /x.! . (3.3) 

j=l J j=l x.=Q 3 3 
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For s=2 we have 


l-F(E) = Pr{0 circuits failed} + Pr{l circuit failed} 
n n 

= n [l-F.(E)] + I F (E) n [1-F (E) ] 
j=l J k=l k j^k J 


(3.4) 


r .-1 

n j -a.E x 


r .-1 „ 

n j -a E 


y n y e 3 (a E) 3 /x.! - (n-1) II £ e 3 (a E) j, , 

=1 j/k X j=0 3 3 j=l x^.=0 ' j ' 


k= 


For general s we consider only the case in which r j =r anc * 


a ,=a 
J 

for all j=l,...,n . Then 

r-1 „ 



Fj(E) = F jV (E) = 

1 - y e a (aE) X /x! , 

(3.5) 


x=0 


and 

s-l / 

i - fce) = i (j 

!) [F^(E) ] k [l-F^(E) ] n_k , 



k=0 Vl 


and 

F(E) - I (“) 

[F A (E)] k [l-F A (E)] n_k . 

(3.6) 


k=s 



3.2.2 Case 1: r =1 for all 


We 


first treat the situation in which s-1 


For r =1 we have 
J 


-a.E 

F.(E) = 1 - e 2 . Then, from (3.3), we find 


where 


✓ x -naE 

F(E) = 1 - e 

-t n 

a 2 i l a. 
n . 1=1 J 


(3.7) 

(3.8) 


is the average shock rate. Equation (3.7) shows that the probability 
distribution of failure is exponsni'^cd . 
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We now examine the asymptotic behavior of F(E) as E decreases 
to zero. In general, for these asymptotic analyses, we will write 

F(E) ~ cE q to indicate that F(E) /E^ approaches the constant c as 
E decreases to zero. The method used will almost always be to replace 

terms of the form e ^ by the power series ]^_Q(-kE) / i i an d then to 

collect coefficients of the various powers of E . Details will be 
omitted. The present case of the exponential distribution (s-1, all 
r =1) is the easiest, and yields, from (3.7), 

F(E) - naE ; (3.9) 

i.e., F(E) is asymptotically linear in E . 


Now we treat the situation in which s=2 . Direct substitution 
into (3.4) yields 

F(E) = 1 - e _naE £ l e 3 ^ - (n-l)J . (3.10) 

This expression does not correspond to the cumulative distribution func- 
tion (CDF) of any well-known probability distribution. Asymptotic anal- 
ysis of (3.10) yields 

F(E) ~ cE 2 , (3.11) 


where 


c = (n 2 a 2 - ^a 2 )/2 , so F(E) 


is asymptotically quadratic in E 


Specializing expressions (3.10) and (3.11) to the case a j =a f° r 
all j leads to simpler expressions for F(E) and c , but the former 
still does not correspond to the CDF of any well-known distribution. 
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3.2.3 Case 2: r.=2 for all j=l,...,n 

J 


-a.E 


Now F j( E ) = 1-e ^ (H-a^E) . For the case s=l expression 


(3.3) leads to 


n 


F(E) =* 1 - e" naE n (1+a.E) . 

j=l 2 


(3.12) 


This is not the CDF of any well-known distribution, even if it is spe- 
cialized to the case a j =a f° r a H 3 • Asymptotic analysis of (3.12) 

yields 


F(3) ~ cE , 


(3.13) 


where c = (^a^.)/2 , so here again F(E) is asymptotically quadratic 
in E . 


F(E) = 1-e 


For s=2 expression (3.4) leads to 
-naE 


L k=l jtft 2 

and asymptotic analysis of (3.14) yields 


n a, E n 

l e K n (l+a^E) - (n-1) n (l+a,E) 

j=l 


3 


, (3.14) 


F(E) ~ cE , 

where 

c ■ ( X a j a k )/4 • 

3<k J 

For the case a j =a For all 3 » c he constant c simplifies to 


(3.15) 


n(n-l)a /8 . 


3.2.4 Case 3: n=l 

In this case the item is treated as being composed of only one 
circuit. We now replace r. by r and a. by a , and must have s=l . 
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Then expression (3.3) yields 

r-1 v 

F(E) = 1 - I e" ah (aE) X /x: = F*(E) . (3.16) 

x=Q 

Differentiation of (3.16) with respect to E yields the density function 

f (E) = dF(E)/dE «• a(aE) r_1 e' aE /(r-l)! , (3.17) 

which shows that the failure distribution is Evlang-r. Asymptotic analy- 
sis of (3.16) is a bit more complicated in this case, but eventually 
yields 

F(E) ~ cE r , (3.18) 

where c = a r /r! . Thus F(E) is asymptotically linear in E for r=l 
(the exponential case), quadratic in E for r=2 , etc. 

3.2.5 Case 4: general s 

Expression (3.6) gives F(E) for the case r j =r an< ^ a j =a ^ or 

all j , and this cannot be significantly simplified, even for r=l . 

For the asymptotic analysis we note that s of the n circuits must 
fail in order to cause failure of the item, and each of the s failing 

circuits has F^ (E) ~ a r E /r! . Thus 

F(E) ~ cE rS , (3.19) 

where 

c - (")(a r /r:) S (3.20) 

It is not difficult to see that expression (3.19) also holds when the 
a^ are different; the constant c is then no longer given by (3.20), 

but is instead more complex to express. What is important, of course, 

is that F(E) ~ cE q , where q = rs . 
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3.2.6 Case 5: asymptotic results 

for the general case 

For the asymptotic analysis the argument is still valid that s 

of the n circuits must fail in order to cause failure of the item, 

r r 
1 1 

and now we have (E) ~ E J / r^J . Asymptotically, therefore, it 
is not difficult to see that 

F(E) ~ cE q , (3.21) 

where c is a constant whose exact value is algebraically messy to write 
down and 


q “ I r , (3.22) 

P-1 J P 

where the r. are ordered so that 
J , 


r. <r. < . . . < r . 

J 2 = = 

Expression (3.22) reduces to q=rs if the s smallest r^ values are 
all equal to r . 


3.2.7 Case 6: limiting case 

for s=l and large n 

We consider only the case where r_.=r an< * a j =a ^ or ^ * 

As discussed by Mann, et al. (1974), pp. 102-108, as n approaches 
infinity the limiting failure distribution is Weibull with shape para- 
meter r . This has CDF 

~cE r 

F(E) = 1 - e b , (3.23) 

where c is a constant. For r=l the Weibull distribution is the same 
as the exponential. The asymptotic form of (3.23) is clearly 

F(E) ~ cE r . (3.24) 
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3.2.8 Conclusions 

The failure model examined leads to different failure distributions 
when different choices are made for the parameters n , s , r^ , and a^ . 

For s=l and all r^=l we obtain the exponential distribution (Case 1) . 

For n-1 (so that s=l and r^=r) we obtain the Erlang-r distribution 

(Case 3). For n large, s=^l , and r j“ r anc * a j ==a ^ or J * t ^ ie 

failure distribution is approximately Weibull with shape parameter r 
(Case 6). For the other combinations examined, the failure distribution 
is not one of the standard ones. For a particular piece of electronic 
equipment, examination of the circuitry and physical arrangement of the 
components may help to establish which combinations of the model parame- 
ters are most appropriate (if indeed any of them are). 

Asymptotic analysis indicates that the failure probability F(E) 

behaves like cE^ for very small values of E , where c is a constant 
and q is a positive integer. The exponential failure distribution is 
the only one for which q=l ; for all other cases q >1 . Both the 
Erlang-r (Case 3) and Weibull-r (Case 6) distributions yield q=r , but 
these are not the only failure distributions with q=r ; for example, 
q=2 also arises (a) with s=2 and all r^l ( Case 1) » (b) with s=l 

and all r^=2 (Case 2), and (c) in the general Case 5 with 


s 



Since the same asymptotic form of F(E) arises from a number of differ- 
ent failure distributions, it is clearly inappropriate to infer the type 
of failure distribution from an examination of F(E) [or an estimate 
F(E) of F(E)] for small values of E . 
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It Is important to note that, except for the exponential (s=l 
and all r^l), all t * ie cases examined here lead to failure distributions 

with increasing hazard rate (IHR). The analyses and remarks of Section 
4,2 on testing for the exponential failure distribution are therefore 
appropriate. 

4. Treatment of Parameter Estimation 

A second major source of error in probability modeling is due to 
having to estimate parameters of probability distributions and in some 
case the probability distributions themselves. This section treats some 
estimation problems and the sensitivity of the risk analysis models to 
some of the estimated values. 

4.1 Estimating the parameter of the 
exponential failure model 

For the exponential failure model used in the GFRAP risk analysis 
it is necessary to know the value of the parameter E . Since the actual 
value of the parameter can never really be known, the usual procedure 
is to use an appropriate estimate obtained from test data. The estimate 
most commonly used is the maximum likelihood estimate; we develop that 
here. We will use a Bayesian approach to treat the case in which no 
failures have been observed. 

Suppose m+n identical pieces of equipment have been exposed to 

graphite fibers (or perhaps the same piece of equipment ra+n times) 

and m of them ]iave failed at exposures while the other n 

1 m 

have survived exposures of E E , . Either m or n could be 

m+1 m+n 

zero. Based on the exponential failure model P = PrCltem survives 
-* (E /e^ 

exposure E) = 1-e , it is straightforward to show that the likeli- 

hood of this sample result A is given by 
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X(k) = 


[ m -(E /E)"irm4ti -(E /E)' 

n e" 1 e n e 

i=l J Li=m+1 


= "-in e -(E*/E) , 

where E* = E ^ is the total exposure. The maximum likelihood 

estimate of E , call it E , is the value of E which maximizes df(A) ; 
it is easily found to be 

A 

E - E*/m 9 

provided m > Q . If m=0 , i.e., no failures have been observed, there 
is no maximum likelihood estimate for E ; we turn therefore to the 
Bayesian approach. The Bayesian approach uses a prior distribution on 
E and combines it with the likelihood o^(A) via Bayes 1 theorem to ob- 
tain a posterior distribution on E • The mean of the posterior distri- 
bution is often taken as a point estimate of E . 


A convenient prior distribution to use is one that is a natural 
conjugate of the likelihood <>\(A) , and in this case that is an inverted 
gamma— 1 distribution [see Raiffa and Schlaifer (1961) , Chapter 10] of the 
form 


K'i-”'- 1 e- (E ' /E) 


where m* > 0 and E* > 1 are parameters and K' is a normalizing 
constant. The mean of this prior distribution exists if m' > 1 and 
is E'/Cm'-l) . When this prior distribution is combined with the 
sample likelihood of (A) via Bayes* theorem, the posterior distribution 
is easily found to be inverted gamma - 1 also, with parameters m* = m*+m 
and E* * = E'+E* . The posterior mean is thus (E*+E* )/(m+m' -1) • 


How should the prior parameters m' and E* be chosen? The 
following rationale is admittedly ad hoo i but it is rational and it 
leads to a reasonable and useful estimate of E . For m > 0 , a 
Bayesian analysis is not needed since we will use the maximum likelihood 

A 

estimate E = E*/m , so we are only concerned with the case m=0 . 
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For the case m=0 the posterior mean is (E*+E' ) /(m'-l) . If 
one failure had been observed along with the total exposure of E* , 

A 

we would use the maximum likelihood estimate E = E*/l = E* . We will 
therefore choose the prior parameters m' and E' so that with m=l 
and the same total exposure E* , the posterior mean would have this same 
value E* . This yields (E*+E' )/m' = E* , or E' = E*(m’-1) . 


We will choose m' so that the standard deviation of the prior 
distribution is k times the prior mean. This requires m' > 2, and 
yields the equation 


[Var IK, E')]- 5 ' = r = 

(m'-l) (m' -2)’ 5 

When this is combined with the previous equation E T 
posterior mean is found to be 

E*(2k 2 +l)/(k 2 +l) . 


kE' . 
(m'-l) 

= E*(m'-1) , 


the 


To indicate a good deal of prior uncertainty about the true value of 
E it is appropriate to chose k rather large. Even for k=2 the 
above expression yields a posterior mean of 1.8E* , and as k tends 
toward infinity m 1 tends toward 2, E f tends toward E* , and the 
posterior mean tends toward 2.0E* . It is suggested that this value 
be used as an appropriate estimate of E for the case of no failures; 


i.e., use E = 2E* if m=0 . 


There is an alternate rationale for choosing m T and E f , 
again ad hoo, that leads to the same prior parameters (m T = 2 and 
E 1 = E*) and hence the same value of the posterior mean, 2E* . One 
may imagine the value E* as having been chosen as the maximum total 
exposure for the experimental testing because it was felt that this 
much total exposure would yield at least one failure. From the rela- 
tions m f f -l = (m-l)-hn and E 1 1 = E ? +E* one can interpret the prior 
information implied by the choice of m r and E f as being equivalent 
to having observed (m’-l) failures caused by a total exposure of E 1 . 
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The choice of E* as hypothesized just above then suggests the choices 
(m'-l) = 1 and E f = E* . 

Finally, let us note that if no failures have been observed and 
a conservative estimate of E is desired, then E - E* is appropriate 
since this is equivalent to conservatively assuming that one failure 
has occurred. 

4.2 Testing for an exponential failure 
distribution 

We were given failure data for equipment exposed to carbon fibers. 

i i • 

Twenty - one separate experiments were conducted using different electrical 
components and/or different lengths of types of fibers. Each experiment 
was replicated several times (the number varied between 3 and 11) . The 
equipment, fiber, and performer of these experiments are summarized in 
Table 4-1. 

Using the data, we wish to test the hypothesis that an exponential 
distribution is a reasonable failure model for electrical equipment ex- 
posed to graphite fibers, i.e., that the probability p of failure is 
related to the exposure x by the functional relationship p = F(x) = 

1 - exp (- Ax) , for some X , where 1/ X = mean exposure to failure. 

The exponential distribution is used widely in reliability studies , 
and consequently there is a considerable body of literature concerned 
with estimation and testing related to the exponential distribution. A 
good discussion is found in the book by Mann, Schafer, and Singpurwalla 
(1974), Ch. 7. 


4.2.1 The Kolmogorov-Smirnov- 
Lilliefors test 

Many test are based on the empirical cumulative distribution 
function. The most widely used of these appears to be the modification 
of the Kolmogorov-Smirnov test due to Lilliefors (K-S-L) . We now 
describe it. 
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TABLE - 

i-1 





FAILURE DATA: 

21 TESTS 

OF EQUIPMENT 


1 . 

Avionics; terminal blocks 

7.5 

mm 

AS 


BRL 

2. 

Dynaco amplifier 

7 

mm 



Mike Vogel 

3. 

Dynaco 

7 

ram 



Mike Vogel 

4. 

Dynaco 

3.5 

ram 



Vogel 

5. 

Dynaco 

15 

ram 



Vogel 

6. 

Dynaco 

15 

mm 



Harvey 

7. 

Dynaco 

3.5 

ram 



Mike Harvey 

8. 

Dynaco 

7 

ram 



Mike Harvey 

9. 

Dynaco 

1 

mm 

GY70 


Phillips 

10. 

LSI-11 computer 

4.5 

mm 



BRL 

11. 

LSI-11 computer 

7 

mm 



BRL 

12. 

19" TV 

8 

mm 

HMS 


BRL 

13. 

19" TV 

8 

mm 

T300 

sized 

BRL 

14. 

19" TV 

8 

ram 

T300 

unsized 

BRL 

15. 

Transponder 

10 

mm 

GY70 


Bionetics 

16. 

Transponder 

3 

ram 

GY70 


Bionetics 

17. 

Sunbeam toaster 

7 

mm 

T300 


Bionetics 

18. 

Sunbeam toaster 

3 

mm 

GY70 


Bionetics 

19. 

Heritage House 

12 

mm 

T300 


Bionetics 

20. 

Heritage House 

7 

mm 

T300 


Bionetics 

21. 

Sunbeam toaster 

12 

mm 

T300 


Bionetics 
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Suppose we test n items. For each item, note the exposure level 
x^ at which it fails. (Note: all items must be exposed until they 

fail; the results described here are not valid if items are withdrawn 

before they fail.) Thus we get n failure values: ,x„ 

1 l n 

The empirical distribution function is 

* 1 

F (x) = — x (number of failure values < x) 

n n = 

(see Figure 4-1). This function is an unbiased estimate of the underly- 

A 

ing CDF, F ; i.e., E(F (x)) = F(x) , for all x . (It has mathemati- 

n 

cal properties which enable one to use it in test statistics and to 
make statements about the possible inference errors of such procedures, 
e.g., probability of false rejection of a true hypothesis.) 

The usual procedure is to look at the maximum deviation between 

A 

F^ and the hypothesized CDF F^ . However, we are not hypothesizing a 
single distribution, but an entire family, the exponential family 


y 


l.Q 


.5 


0 



-i x * — x x — x x — *~X X 

failure values 


Figure 4-1. — The empirical cumulative distribution function. 


(F^ : F^(x) = 1 - e , any X}* In this case we estimate the para- 

meter X from the data (for the exponential X = n / J ^ x^) , and 


then look at the maximum deviation between the empirical CDF F^ and 
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. A 

the exponential CDF F^ , with estimated parameter A , i.e., we look at 

d = sup |F (x) - FC'(x) | , 

O^x^ n A 

If d is large, we reject the hypothesis of exponentiality. If d is 
small, we accept it. The critical values depend on n and on a > the 
desired probability of false rejection. They are given by Lilliefors 
(1969) . For example, if n=6 and we desire only a 5% chance of falsely 
rejecting the hypothesis of exponentiality we should reject if 

A 

sup |F n (x) - F~(x) | > .406 . 

X 

We have applied the K-S-L test for exponentiality to four of the data 
sets obtained from NASA Langley, They are graphed in Figures 4-2 through 
4-5. On all four, the hypothesized (best-fitting) exponential CDF is 
plotted, and then upper and lower bounds are given, based on that expo- 
nential CDF ± the critical value (.406, in the case of 6 observations). 

If the empirical CDF falls in this region, we accept the hypothesis of 
exponentiality. Otherwise we reject it. Note: for set #1 (Figure 4-2), 
we just barely reject exponentiality at the a- 5% level (i.e., we would 
expect such a deviation to occur due to chance alone to be an event with 
probability < * 05) ; for set #2, we accept exponentiality, even though the 
best fitting Erlang has shape parameter 2 or 3; for set #15 (Figure 4-4), 
the data is highly significant: reject exponentiality; for set #16 we 

accept, but just barely, even though Erlang-4 is the best fitting. In 
selecting the data sets to analyze, we picked the ones which seemed to 
deviate the most from being exponential. The results do not constitute 
an overwhelming rejection of exponentiality. Among 21 sets of data, 

§J1 drawn from exponential populations, it would not be surprising to 
find 2, say, which are significant at the 5% level. 

To give some idea of how difficult it would be to distinguish 
between an exponential distribution and an Erlang- 2, we generated a sample 
of size 10 on an HP- 25 calculator and then normalized the values so the 
sample mean was 1. The plot of the exponential CDF, the Erlang-2 CDF, 


2 7- 




Figure 4-2. — Data Set # 1 : avionics terminal blocks 3 7.5 mm. 
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and the empirical CDF drawn from the exponential are given in Figure 4-6. 
Note that for such a small sample, neither distribution is an obvious 
choice. 

One property of the K-S tests is that they have very low power. 
"Power" is defined as the probability of rejecting the hypothesis when 
it is false. If the true distribution is Erlang-2, we estimate that the 

A 

chance of rejecting the hypothesis of exponentiality, using a sample of 
size 10 and a = .05 , would be .20 or less. 

4.2.2 The cumulative total-time-on-test 
statistic 

The low power of the K-S-L test motivates the search for a more 
powerful test, i.e., one that is more likely to reject the hypothesis of 
exponentiality when it is false. One way to increase power is to con- 
struct a specialized test which is especially good at rejecting exponen- 
tiality when a certain class of alternatives is true and to demonstrate 
that this restricted class of alternatives includes all possibilities: 
roughly speaking, it is equivalent to saying that you have a better 
chance of making the right decision if there are fewer alternatives from 
which to choose. 

For the failure distributions under consideration it is reasonable 
to assume nondecreasing hazard rate (IHR). Let F be the CDF of a 
random variable and f its density, then r(x) = f (x)/(l-F(x)) is the 
hazard rate. It follows that Pfx^X^x+dx | x.<X} ~ r(x)dx . The dis- 
tribution is IHR if r is nondecreasing. If r is constant, then we 
have an exponential distribution, which is considered a boundary case of 
IHR. IHR is a reasonable assumption for the distribution of failure 
probabilities as a function of exposure; it is equivalent to saying: if 
two components (#1 and #2) have survived exposure levels e^ and e^ > 

respectively, e^ < , then the next increment of exposure, Ae , is 

more likely to cause failure to #2 than #1' i.e., r(e^)Ae < r(e 2 )Ae 

for e^ < e^ • 
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Thus we are in a situation of testing the hypothesis of exponen- 
tiality versus the alternative of IHR, The most powerful test (we know 
of) for this situation is discussed by Barlow (.1968) . It is actually a 
test for exponential ys f increasing hazard rate average (IHRA), A test 
which is virtually as powerful is a test based on the time-on- test sta- 
tistic. This test has other desirable features, such as applicability 
to censored samples and a nice statistical distribution theory which 
makes it appear more attractive. It is described in detal by Barlow, 
Bartholomew, Breraner and Brunk (1972), Secion 6.2. We apply this test 
to data set #16 (plotted in Figure 4-5), the transponder exposed to 3 mm 
fibers to GY 70 (Bionetics data) • The 10 normalized failure times 
(exposures) are 

.36, .56, .58, .85, 1.05, 1.06, 1.09, 1.26, 1.42, 1.78. 

A test statistic is computed as follows: let X, be the i th ordered 

i : n 

failure value, let D. = (n-i+1) (X, - X. - ) be the time (exposure) 

’ i:n i:n i-l:n 

on test accumulated between the (i-l)s£ and the 1th failure values. This 
test uses the fact that 


PfD i:„ > D j:n> = 2 • 1 * J > 

under the assumption of exponentiality , while under the assumption of IHR, 

D- > D n > D_ > ... > D ; 

l:n = 2:n ~ 3:n = = n:n 

st st st st 

Thus under exponentiality the D^ #n 's will tend to have 0 slope as a 

function of i ; under IHR the slope is negative. Thus this becomes a 
regression problem of testing for zero slope vs. negative slope. The 
appropriate statistic is 


n 


i ^ _ -I ^ 

= n’ 1 y (i-l)D . / n 1 Y D 

.4- v 7 n- l+l : n i:n 

i=l i=l 

= n' 1 n f[ l D I / n _1 l D 

i=l Lj = l i=l i,n 

= cumulative-total-time-on-test statistic. 
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We reject exponentiality for large values of V r . Critical values of 

V are given by Barlow et ah (1972), page 269; for n=10 they are as 
n 

follows: 


C* .10 .05 

critical value 5.619 5.927 

For data set #16 we get the following: 


.025 

6.189 


.010 

6.487 


.005 
6.683 . 


i 


i:n 


D 


i:n 


T n (X i 


) = l D. 

n j=i 3 :n 


V 

n 


1 

.36 

3.60 

3.60 

2 

.56 

1.80 

5.40 

3 

.58 

.16 

5.56 

4 

.85 

1.89 

7.45 

5 

1.05 

1.20 . 

8.65 

6 

1.06 

.05 

8.70 

7 

1.09 

.12 

8.82 

8 

1.26 

.51 

9.33 

9 

1.42 

.36 

9.69 

10 

1.78 

.36 

10.05 




67.20 

n-1 




I 

i=l 

T (X. ) 

n i:n 

/ T (X ) 
n n:n 

= 67.20/10.05 = 6.72 


This corresponds to significance at approximately a ^ .005 . Thus 
we reject exponentiality in favor of IHR. (We accepted it using K-S-L.) 
Thus using this criterion (the cumulative- total-time-on-test statistic) 
we have observed a deviation from exponentiality which would occur by 
chance with probability .005. Note that in the K-S-L, the maximum devia- 
tion hit the lower edge of a 95% two-sided region; this lower boundary 
would be an approximate boundary for an a = 10% level one-sided test. 
Thus the K-S-L criterion says such a deviation will occur approximately 
10% of the time purely by chance. The extra sensitivity of the 
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cumulative- total- time-on- test statistic is due to the fact that it is 
tailor-made for testing the situation in which we are interested, while 
the K-S is applicable to a broader range of situations and thus should 
not be expected to compete favorably with special tests in special 
situations • 

We applied the cumulative-total-tirae-on-test statistic to all 21 
sets of data. The results are tabulated in Table 4-2. Considering each 
experiment separately, there are four cases where exponentiality can be 
rejected and three questionable cases. Exponentiality is a plausible 
hypothesis for the remaining cases. 

These conclusions are based on considering each test separately. 
Exponentiality was rejected when the test statistic took an improbable 
value (in the 1% extreme tail). However, in 21 trials the chance of an 

21 

even of probability .01 occurring at least once is 1 - (.99) = .19 . 

Thus all 21 populations could have exponential and yet there would be a 
.19 chance of rejecting exponentiality for at least one of the 21. Thus 
to be rigorous and be sure that the chance of making such an inference 
error is .05, say, or less, we must reject at the .00244 level of 

21 

significance (since 1 - (.00244) = .05) . In this case exponen- 

tiality is rejected for only two cases. 


It is possible to construct a joint test of exponentiality using 


£ 

the time-on-test statistic. Note that V = u_ + u 0 + . . . + u - , 

n 1 z n— x 

where u^ 1 s are independent uniform [0,1] random variables, under the 
assumption of exponentiality. This fact allows us to construct a test 
for exponentiality of all 21 sets: 


Hq : all are exponential; 

H- : all are IHR, with at least one being 

strictly IHR (i.e., not exponential). 


Suppose the i th data set consists of 


observations; let 



be 
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TABLE 4-2 

TESTING FOR EXPONENTIAL VS. IHR USING CUMULATIVE- 
TOTAL-TIME-ON-TEST STATISTIC 


Test # 

Number 

Observ. 

Value of 
Statistic 

Expected 
Value 
Given Hq 

Level of 
Significance 

Conclusion 

1 

6 

3.86 

2.5 

.015 

? 

2 

11 

7.11 

5.0 

.01 

Reject 

3 

5 

3.13 

2.0 

.025 

? 

4 

6 

1.90 

2.5 

o 

00 

• 

Accept 

5 

5 

1.44 

2.0 

.81 

Accept 

6 

4 

1.58 

1.5 

.44 

Accept 

7 

4 

1.02 

1.5 

.79 

Accept 

8 

4 

1.95 

1.5 

.22 

Accept 

9 

4 

1.88 

1.5 

.25 

Accept 

10 

5 

2.88 

2.0 

.07 

? 

11 

10 

5.36 

4.5 

.17 

Accept 

12 

5 

1.09 

2.0 

.92 

Accept 

13 

4 

1.71 

1.5 

.36 

Accept 

14 

3 

.90 

1.0 

.58 

Accept 

15 

8 

5.70 

3.5 

.002 

Reject 

16 

10 

6.72 

4.5 

.005 

Reject 

17 

10 

8.38 

4.5 

~ .000008 

Reject 

18 

10 

3.94 

4.5 

.73 

Accept 

19 

10 

4.39 

4.5 

.55 

Accept 

20 

10 

4.32 

4.5 

.58 

Accept 

21 

10 

2.88 

4.5 

.98 

Accept 
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the cumulative-total-time-on-test statistic for the ith data set. Then 


V 


21 


y V 


i=l V 1 


is an aggregate statistic, which if is true is. the sum of 

21 

£ (n -1) independent uniform [0,1] random variables and consequently 
i-1 1 

approximately Normal with easily calculated mean and variance, from 
which critical values can be computed. 

In light of data set #17 (see Table 4-2), it is obvious that : 

all exponential must be rejected. Considering #17 an outlier or an 
anomaly, we threw it out and tested the hypothesis : all exponential 

for the remaining 20 data sets. The test statistic 


V 


21 

I 

±=i 

#17 


V 1 


63.75 . 


If Hq is true, this is an observation from a Normal (— j-, -j^ - ) dis- 
tribution; its level of significance thus equals .002 and is firmly 

rejected. 


Barlow (1968) has computed the power curve of the cumulative- total- 
time-on-test statistic when the true distribution is Gamma with shape 
parameters between 1 and 5 for an a = .05-level test based on n = 10 
observations; see Figure 4-7. Thus, for example, if the true distribution 
is Erlang-5, ’ the test will reject exponentiality with probability 2-0.69 . 

The cumulative-total-time-on-test statistic can be applied to data 
with censoring. If items are exposed to graphite fibers, but do not fail, 
they are eventually withdrawn from test. The withdrawal exposure level 
is noted and the data thus take the form of failure exposure levels and 
withdrawal exposure levels. The K-S-L approach has no provision for 
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Shape Parameter of True Gamma Distribution 

Figure 4-7 . — Power curve for test of exponential vs. Gamma based 
on cumulative- total-time- on-test statistic. 

handling these types of data. Fortunately, the cumulative- total- time- 
on-test approach handles these types of data easily; it is just a matter 
of redefining D^ #n , the total- time-on- test between the (i-1 )st and ith 

failure levels to accommodate withdrawals in this interval. And instead 
of computing , we compute , where k is the number of observed 
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failures. In view of this it is very important to make the distinction 
between failure times and times of withdrawal from testing of nonfailed 
items. 


4.2.3 Conclusions 

There are good tests for exponential vs. IHR, but it is hard to 
distinguish between them with small samples. This results in the hy- 
pothesis of exponentiality appearing plausible in many instances. Since 
to accept exponentiality when the distribution is IHR is a conservative 
error, exponentiality should be used as the failure distribution except 
in cases where knowledge of the vulnerabilities of the box allow a more 
detailed model which results in another failure distribution such as 
an Erlang. 

One unexpected feature of the data is that in data sets #15 and 
#16, the case with length 10 mm deviated more from exponentiality than 
the case with length 3 ram. However, this can be explained by the ran- 
domness inherent in the small samples. 

For simulation inputs, what is really needed is some upper confi- 
dence bounds on the failure distribution CDF or some method of handling 
an estimated distribution. 

Incorrectly assuming an exponential failure distribution when the 
distribution is actually IHR (e.g., Erlang-n with n > 1 ) is conservative 
in the left-hand tail but optimistic in the right-hand tail. That is, 
the exponential distribution with equal mean exposure to failure gives a 
higher probability of failure for low exposure levels and a lower prob- 
ability of failure for high exposure levels. This is illustrated in 
Figure 4-8, which compares the failure probabilities for exponential, 
Erlang-2, and Erlang-4 distributions with the same mean value (equal to 
1 here). Note that the exponential distribution gives conservative 
results for all exposures less than about 1.2 times the mean exposure 
to failure, at which point the failure probability is already very high — 
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equal to about 0.7. Thus, for the small exposure levels to which actual 
electronic equipment yill nprmally be subjected after a CF indident, the 
use pf an exponential failure model produces significantly conservative 
results. 

4 . 3 Confidence regions for IHR failure 
distribution 


In this risk study it is crucial to know the relationship between 
exposure level and failure probability for an electrical component. 

Since this failure probability distribution is estimated from data, there 
is uncertainty involved and confidence regions should be used. In this 
case an upper confidence bound on the failure probabilities is needed. 
Typically, there is a small chance of failure at the exposure levels we 
expect to encounter, thus we are primarily concerned with estimating 
(with confidence) the left tail of the failure distribution. 

To illustrate this problem, we consider some failure data col- 
lected by Westinghouse for a 7.5 KV insulator pin exposed to 5 millimeter 
fibers. Fifteen tests were performed. The failure data are presented 
in Figure 4-9 in the form of an empirical distribution function of fail- 
ure probability versus exposure. In accidental releases we expect to see 

5 3 

exposure levels up to 10 fiber sec/m ; consequently, we are concerned 
with the extreme left tail in Figure 4-9. 

We would like to estimate failure probabilities of this component 

for exposures in the neighborhood of 10 • There are two classical ways 
to do this (See Sections 5.2 and 5.3): we can get point estimates based 

on Binomial probabilities. The fact that 0 of 15 components failed at 

an exposure level of 10 leads to a 95% confidence statement: "Prob^. 

ability of component failure at exposure level of 10 5 fiber sec/m 3 is 
less than .181." The other method is to compute a Kolmogorov- Smirnov 

simultaneous 95% confidence region: "F(e) < F^Ce) = F 15 (e) + .304," 
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f 


where F(e) equals the probability of failure at exposure level e and 
F (e) equals the estimate of this probability from 15 observed failures. 

A 

(The boundary F u of this upper confidence region is plotted in Figure 

4-9.) The Kolmogorov- Smirnov gives a 95% confidence statement: "Prob- 

5 3 

ability of component failure at exposure level 10 fiber sec/m is less 
than . 304." 

Clearly, we would like to get more precise estimates. This is 
possible. It can be done by exploiting a physical property of the fail- 
ure process, namely, that it has an increasing hazard rate (IHR). This 
fact was discussed in the context of testing for exponential failure 
laws in Section 4.2. 


Let F be the failure distribution and f its density. Define 

h(e) = f (e)/(l-F(e)) and H(e) = / 0 e h(t)dt - -log(l-F(e) ) . The func- 

tion h is the hazard rate and H is the cumulative hazard or log sur- 
vival function. If the distribution is IHR, then h is monotone nonde- 

A 

creasing and H is convex. Let F^ be the K-S upper bound. If we 


assume an IHR failure distribution, then a 95% confidence region for F 

a 

will consist of all IHR distributions bounded by F . It turns out that 

u 

A 

there exists a maximal IHR distribution F among all distributions 

u, IHR 

A A A A 

bounded by F : Let H =-log(l-F ) and H TTir , be the greatest con- 
J u u B u u, IHR & 

A 

vex minorant of H (plotted Figure 4-10, then F = 

u u, 1HK 

1 - exp(-H TTT J. If F' is IHR and F' <. F , then F' < F T „_ . 

u, IHR — u — U,1HK 

A 

A 95% confidence region F _< F is indicated in Figure 4-11 for the 

— u, IHR 


data of Figure 4-9. 
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Figure 4-10. — Empirical cumulative hazard upper bound, H , and 
its greatest convex minorant • 
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Note that Figure 4-11 is a much more accurate region, especially 
in the left tail. This accuracy was gained by assuming a more restricted 
model, namely, an IHR failure distribution. For the previously consid- 

e O 

ered exposure level of 10 fiber sec/m , we. a,re 95% confident that the 
probability of failure is less than .0048. This is great improvement 
over the other estimates. 

4. 4 Sensitivity of the model to estimated 
parameters and distributions 

We have written a special simulation program designed to provide 
fast and economical sensitivity analyses of the risk profiles obtained 
as output of the GFRAP risk analysis. Moreover, this program follows a 
modified simulation approach which yields statistically valid confi- 
dence bounds on the risk profiles obtained. The program requires as in- 
put the probability distributions of damage per accident at each of the 
major airports being considered, and we used analytic approximations to 
the empirical damage distributions reported by Arthur D. Little, Inc. 
(ADL) in Kalelkar, et cd . (1979). We have performed an extensive series 
of sensitivity runs pf the program, and of an additional computer prog- 
ram written to provide partial analytic results when the analytic ap- 
proximations just mentioned take the form of Lognormal distributions. 

This section describes the simulation program, details our efforts to 
analytically approximate the different airport damage distributions, and 
finally presents the results of the sensitivity runs. 

4.4.1. The simulation program 

For ease in handling simulation error (to be discussed in Section 
5.1) we recommend the following procedure for the simulation: 

Step 1: Generate by Monte Carlo methods the number of accidents 

in a year. 

Step 2; Determine by Monte Carlo methods at which airport each 
of the accidents generated in Step 1 occurs. 
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Step 3: For each accident, determine the cost. 

Step 4: Total the cost of all the accidents for the year. 

Step 5: Repeat the above four steps n times (yielding a 

sample size of n years) . 

Step 6A: Compute the empirical national annual risk profile. 

directly for the sample of n years from the Step 4 
values. 

Step 6B: Compute the empirical national conditional (given 

one accident) risk profile directly for the sample 
of n years’ worth of accidents from the Step 3 
values. 

We have followed this procedure in our specialized simulation program, 
and will now give the essential details of that program, referring 
appropriately to the steps above. 

In Step 1 we assume that the number of accidents in a year is a 
random variable following a Poisson distribution; thus only the mean y 
of that distribution is required as an input quantity. While the assump- 
tion of a Poisson distribution seems to be generally accepted, and there 
are some theoretical bases for it, other discrete distributions or a more 
complex relationship could be accommodated by the program without signif- 
icant alteration. For example, the mean y of the Poisson distribution 
could be treated as a random variable following a specified probability 

distribution. 

In Step 2 it is assumed that each accident occurs at or in the 
vicinity of one of a given number, say , of airports. Each of these 

is characterized by a probability, say P ± for airport i , that an 

accidents occurring at one of the N a airports in fact occurs at airport 

i . That is, P => Prob (accident occurs at airport i | it occurs at one 

of the N airports) . These probabilities pertain to every simulated 
a 

accident, regardless of the airports at which other simulated accidents 
occur; i.e., the various accidents are treated independently with respect 
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to where they occur. The numerical values of the (i-l,...,N a ) are 

based on historical data (and possibly on future projections as well) 
pertaining to the weather conditions and numbers of operations, at the 
various airports. 

Step 3 is the critical one and must be dependent on the simulation 
model used to generate the random costs of accidents that take place at 
each of the airports. Such a model must be rather complex in order 

to account for various types and locations of accidents, CF dispersion 
under local weather conditions, local types and quantities of housing and 
industry, etc. Indeed, the development of such a model is one of the most 
significant aspects of the GFRAP, and both ADL and ORI have expended 
considerable effort to this end. Our specialized simulation program 
therefore requires the inclusion of subroutines that generate random 
values of accident cost that are solely dependent on the airport con- 
cerned. For the sake of future reference, let F, . denote the CDF of 

di 

the damage (cost) per accident at airport i , i=l,...,N a . To test our 

simulation program and to obtain sensitivity results we had to make some 
specific choices for the F^ . What we did is described below in 

Section 4.4.2. 

Steps 4 and 5 are self-explanatory and require no comment. Steps 
6 A and 6B are carried out by constructing, in each case, an aggregated 
relative frequency distribution and then converting this to the aggre- 
gated complementary cumulative relative frequency distribution that con- 
stitutes the empirical risk profile. The number and size of the cells 
u§ed to construct the aggregated distributions are specified as input to 
the program. It would not be practical, for large sample sizes, to save 
all the observed values because of their number and the computational 
cost of ordering then (there are n observed values that enter into 
Step 6A and approximately ny observed values that enter into Step 6B) . 
The first fqur moments of the empirical distributions are computed from 
the actual observed values. 
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4.4.2 Approximating the airport 
danjage distributions 

The only information that has been available about airport dam- 
age distributions is that contained in Table 10—1 of Kalelkar, et at. 
(1979). Recent results of both ADL and QRI indicate that more accurate 
and current damage distributions have associated with them significantly 
smaller damage values. As such results were not available when our pro- 
gram was developed and when our sensitivity runs were made, and are still 
not readily available, we based our work on the empirical damage distri- 
butions presented in ADL's Table 10-1. The distributions given there for 
26 different airports are based on 300 values each generated by the ADL 
damage simulation model. 

For each of the 26 distributions, ADL's Table 10-1 provides the 
mean, the standard deviation, the minimum and maximum observed values, 
and the following percentiles: 5th, 10th, 25th, 50th, 75th, 90th, and 

95th. All the minima are zero, as are many of the 25th percentiles. 

Using the given percentiles, we plotted all 26 distributions on log-log 
paper in the form of risk profiles (complementary CDF's). One of these, 
that for Washington, D.C.'s National Airport, is shown in Figure 4-12. 

The right-hand portion of the curve is shown as a dotted line to empha- 
size the fact that this portion is heavily dependent upon the maximum 
value observed and said maximum might vary significantly among samples 
of size 300. (We associated with the maximum value a risk probability 
of 1/300. )_ 

We considered using piecewise-linear approximations to the observed 
distributions in Step 3 of our simulation program, but decided against it 
for two reasons. First, and most important, this would not provide us 
with a convenient mechanism for performing sensitivity analyses through 
controlled changes in the damage distributions used in Step 3. Second, 
the smooth form of the curve in Figure 4-12 ( and of the ones for other 
airports) suggests that a piecewise log-log or log-linear approximation 
might be more appropriate. We therefore concentrated on analytic ap- 
proximations to the observed damage distributions. 
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Examination of the 26 empirical distributions in Table 10-1 of 
Kalelkar et al. (1979) shows them to be considerably skewed to the right; 
the coefficients of variation range from 1,33 to 3,40 and the ratios of 
mean to median range from 1.32 to 133. We therefore considered several 
families of distributions which possess the characteristics of skewness 
to right and nonnegativity (of the associated random variable). Such 
families include the Weibull, Gamma, Pareto, and Lognormal families [see 
Mann et al. (1974)]. Our choice of families to examine was partly in- 
fluenced by the manner in which we decided to fit the empirical distri- 
butions and verify the adequacy of the fit. We wanted to determine the 
two parameters of the theoretical distribution from the mean and standard 
deviation of the empirical one, and then to compare the two risk profiles. 
This is difficult to do with the Weibull and Gamma distributions for the 
following reasons: In the case of the Weibull distribution one cannot 

find directly the two parameters as closed-form functions of the empirical 
mean and standard deviation; a nonlinear equation involving the Gamma 
function must be solved. In the case of the Gamma distribution one cannot 
obtain the CDF in closed form so calculation of the theoretical risk pro- 
file is extremely difficult. 

Both the Lognormal and Pareto distributions fit our computational 
requirements, and examination of the 26 corresponding Lognormal risk 
profiles showed many of them to fit the empirical risk profiles reason- 
ably well. The Pareto distribution was tried for several cases but did 
not fit quite as well as the Lognormal. 

For the purposes of testing the simulation program we therefore 
decided to use Lognormal distributions with means and variances equal to 
those of the empirical distributions. Figures 4-13 through 4-21 show a 
representative selection of the corresponding risk profiles — the ADL 
empirical ones and the Lognormal ones used to fit them. 

We did not close the door on attempts to fit the empirical distri- 
butions; as a next step we examined the possibility of fitting the empir- 
ical distributions by mixtures of Lognormal distributions. That is, 
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Figure 4-16. — Houston risk profile 
































Figure 4-21. — St. Louis risk profile 





T-419 


instead of approximating the empirical CDF ? ± for airport i by a 
Lognormal CDF F^ , we approximated it by the convex combination of 
Lognormal CDF's 

I i 

where each F^ a Lognormal CDF, each 1^ is a small positive 

integer such as 2 or 3 (or even 1), the are positive, and ^ 

1 . A difficulty with this approach is the lack of a clear criterion to 
use in choosing 1^ , the q^ , and the F^^ . Even with a clear 

criterion, the computational effort would be considerable. There is no 
doubt, however, that decidedly better fits could be obtained this way. 
Figures 4- 22(a) and 4- 22(b) show fits to the Philadelphia airport risk 
profile by a single Lognormal and by a mixture of two Lognormals, re- 
spectively; the parameters in the case of the mixture were found heu- 
ristically. The improved fit is readily apparent. Figures 4-23(a) and 
4-23(b) show similar fits to the Los Angeles airport risk profile. 


An interesting result is the following: Let F be the national 

conditional (given one accident) CDF whose corresponding risk profile is 

estimated in Step 6B of the specialized simulation program. If each CDF 

F,. is assumed to be Lognormal, say F . , then 
di X4- 


N 

a 

F - l P<F 
i=l 


i U ’ 


so that F is a mixture of Lognormals. If instead each F^ is 
assumed to be the mixture of Lognormals ^ ^ j * then 


N 


= l l 

i=l j-1 


P i q ij F £ij 


so that F is again a mixture of Lognormals. 
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Figure 4-2z(b). — Los Angeles risk ■profile 
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Because the empirical distributions given in Table 10-1 of 
Kalelkar, et at. (1979) were known to be outdated, we decided against 
any further attempt to fit than with analytical distributions. We 
decided to use 26 Lognomial distributions, with the s^ame means and vari- 
ances as those of the empirical distributions as the basis of our sen- 
sitivity analyses. We did, however, investigate the effect of replacing 
the Lognormal distributions by mixtures of Lognormal distributions. 
Exactly what was done will be detailed below, but the results may be 
summarized by the statement that replacing the Lognormal distributions 
by mixtures of Lognormal distributions had a negligible effect on the 
national conditional and annual risk profiles obtained, even though the 
mixture of Lognormals yields a better fit of the individual airport dam- 
age distributions. 

4.4.3 The sensitivity analyses 

Following the ADL analyses we used = 26 and used for the p.^ 

the values given in the last column of Table C-4 of Kalelkar et al. 
(1979). For our base case we chose the 26 Lognormal distributions to 
have means and standard deviations equal to those of the empirical dis- 
tributions generated by ADL, and we set y = 2.6 . This parameter y 
is the expected number of accidents in a year, and the value of 2.6 cor- 
responds to the agreed-upon projection for 1993. 

Because of the assumption of Lognormal distributions of damage 
per accident at the various airports, the national conditional risk 
P r °fil e (given one accident) is the complementary cumulative distribu- 
tion function (CDF) of a mixture of Lognormal CDF's. That is, 

N 

F " 2 P i F £i » 

i=l 1 X ' 1 

where is. the Lognormal CDF of damage per accident at airport i 

and — (1— F) is the national conditional risk profile (given one 
accident) . It is thus possible to compute the values of the risk 
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profile R exactly with the aid of a table or computer program that pro- 
vides CDF values of the standard normal distribution. (This, is not true 
for the annual risk profile, however.) A separate computer program was 
therefore written to compute R this way, and it was used in adddition 
to check the corresponding risk profiles generated by the Monte Carlo 
simulation program (which generates both risk profiles). 

The annual risk profiles for various sensitivity runs were gen- 
erated by the simulation program; in each case the year 1993 was simu- 
lated 4000 times. 

We performed three separate types of sensitivity analyses. In 
the first, we varied the means of the Lognormal distributions of damage 
per accident at the various airports and kept the standard deviations 
the same. In the second, we varied the standard deviations of the Log 
normal distributions of damage per accident at the various airports and 

kept the means the same. In both cases all 26 means or standard devia- 
tions were changed by the same percentage. In the third type of sensi- 
tivity analysis we varied p > the mean number of accidents in a year. 

This third type of sensitivity analysis affects the annual risk profile 
but not the national conditional risk profile (given one accident) , 
whereas the first two types affect both risk profiles. 

The results are shown in Figures 4-24 through 4-28. Figures 
4-24 and 4-25 show the national conditional risk profiles R for various 
changes in the means (Figure 4-24) and standard deviations (Figure 4-25) 
of the distributions of damage per accident at each airport. Each risk 
profile is identified by the ratio of the value of the parameter (mean 
or standard deviation) to its value in the base case. Tables 4—3 and 
4-5 show numerically some of the results shown graphically in Figures 
4-2.4 and 4-25; no exact values are given since they are not likely to be 
truly meaningful, but instead ratios are given to indicate the effects 
of the change in means or standard deviations of the distributions of 
damage per accident at each airport. Tables 4-4 and 4-6 show the ratios 
of the means and standard deviations of the national conditional risk 
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Figure 4-24— National conditional risk profile R with changes in the means of 
the airport damage distributions (26 airports). 
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TABLE 4-3 

SENSITIVITY OF THE NATIONAL CONDITIONAL RISK PROFILE R TO 
CHANGES IN THE MEANS OF THE AIRPORT DAMAGE DISTRIBUTIONS 

KEY: All entries in the table are ratios of 

(1) the values obtained when the means 
are changed to (2) the corresponding 
values for the base case. The ratio of 
airport damage distribution mean to 
that of the base case is r and d is 
a numerical damage value. 


P {Damage > d} 

(a) 

r = 0.5 

Damage Value d 

r - 1. 5 

r = 2.0 

r = 3.0 

3 x 

io " 1 

.31 

1.87 

2.69 

4.06 

1 x 

io " 1 

.45 

1.53 

2.07 

2.98 

3 x 

10" 2 

.52 

1.30 

1.60 

2.06 

1 x 

io -2 

.62 

1.21 

1.39 

1.63 

3 x 

IO " 3 

.75 

1.13 

1.20 

1.31 

1 x 

IO " 3 

.84 

1.07 

1.10 

1.12 

3 x 

io -4 

1.00 

1.06 

1.06 

.88 

1 x 

io " 4 

1.04 

.91 

.81 

.69 

3 x 

io " 5 

1.20 

.83 

.72 

.60 

Damage 

Value d 

(b) P{Damage > d } 
r = 0.5 r = 1.5 

r = 2.0 

r = 3.0 

.03 

X 10 6 

.50 

1.36 

1.53 

1.65 

.1 

x 10 6 

.42 

1.65 

2.21 

2.81 

.3 

x 10 6 

.42 

1.76 

2.76 

4.33 

1 • 

x 10 6 

.46 

1.53 

2.21 

4.00 

3 

x 10 6 

.65 

1.30 

1.50 

1.95 

10 

x 10 6 

1.07 

.79 

.48 

.23 
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TABLE 4-4 

SENSITIVITY OF THE NATIONAL CONDITIONAL RISK DISTRIBUTION TO 
CHANGES IN THE MEANS OF THE AIRPORT DAMAGE DISTRIBUTIONS 

KEY: All entries in the table are ratios of 

(1) the values obtained when the means 
are changed to (2) the corresponding 
values for the base case. The ratio of 
airport damage distribution mean to 
that of the base case is r . 


Ratio r 

Mean of the National 
Conditional Risk 
Distribution 

Standard Deviation of the 
National Conditional Risk 
Distribution 

.33 

.33 

.97 

.50 

.50 

.98 

.67 

.67 

.98 

1.5 

1.50 

1.04 

2.0 

2.00 

1.09 

3.0 

3.00 

1.23 
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TABLE 4-5 

SENSITIVITY OF THE NATIONAL CONDITIONAL RISK PROFILE 
R TO CHANGES IN THE STANDARD DEVIATIONS OF THE 
AIRPORT DAMAGE DISTRIBUTIONS 

KEY: All entries in the table are ratios of 

(1) the values obtained when the stan- 
dard deviations are changed to (2) the 
corresponding values for the base case. 

The ratio of airport damage distribution 
standard deviation to that of the base 
case is r and d is a numerical dam- 
age value. 


P {Damage > d} 

(a) Damage Value d 
r - 0.5 r = 2.0 

r = 5.0 

r = 10.0 

3 x 

io -1 

1.23 

.60 

.35 

.19 

1 X 

10 _1 

.92 

.84 

.65 

.44 

3 x 

io -2 

.79 

1.06 

1.04 

.89 

1 X 

I— 1 

O 

1 

ro 

.72 

1.33 

1.56 

1.44 

3 x 

IO" 3 

.60 

1.49 

2.02 

2.23 

1 X 

10” 3 

.52 

1.64 

2.59 

3.12 

3 x 

H* 
O 
. 1 

.47 

1.87 

3.41 

4.53 

1 X 

I 

o 

r— 1 

.41 

2.03 

4.14 

6.04 

3 x 

10 -5 

.38 

2.38 

5.48 

8.33 

Damage Value d 

(b) PiDamage > d } 

r = 0.5 r = 2.0 

r = 5.0 

r = 10.0 

.03 

x 10 6 

1.27 

.76 

.55 

.41 

.1 

x 10 6 

1.30 

.74 

.54 

.40 

.3 

x 10 6 

.86 

.85 

.68 

.54 

1 

x 10 6 

.51 

1.34 

1.34 

1.19 

3 

x 10 6 

.21 

2.13 

2.99 

2.99 

10 

x 10 6 

.03 

3.86 

8.48 

10.98 
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TABLE 4-6 

SENSITIVITY OF THE NATIONAL CONDITIONAL RISK DISTRIBUTION 
TO CHANGES IN THE STANDARD DEVIATIONS 
OF THE AIRPORT DAMAGE DISTRIBUTIONS 

KEY: All entries in the table are ratios of 

(1) the values obtained when the stan- 
dard deviations are changed to (2) the 
corresponding values for the base case. 

The ratio of airport damage distribution 
standard deviation to that of the base 
case is r . 


Ratio r 

Mean of the National 
Conditional Risk 
Distribution 

Standard Deviation of the 
National Conditional Risk 
Distribution 

0.5 

1.0 

.55 

2.0 

1.0 

1.95 

5.0 

1.0 

4.84 

10.0 

1.0 

9.68 
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profile R to those of the base case when the indicated changes are 
made in the means (Table 4-4) or standard deviations (Table 4-6) of the 
individual airport damage distributions. 

Figures 4-26 and 4-27 are comparable to Figures 4-24 and 4-25, 
respectively, but are for the annual risk profile, call it S , instead. 
Tables 4-7 through 4-10 show the results numerically, and are comparable 
to Tables 4-3 through 4-6, respectively. 

Figure 4-28 shows the changes in the annual risk profile S as 
y changes; note that increasing values of y produce increasingly 
conservative annual risk profiles as defined later. Table 4-11 shows 
numerically some of the results shown graphically in Figure 4—2 ; again, 
only ratios are given. Table 4—12 shows the ratios of the means and 
standard deviations of the annual risk profile S to those of the base 
case when the indicated changes are made in y . 

4.4.4 Investigation of the effect of 
approximating the airport 
damage distributions by 
lognormal distributions 

As indicated above, the empirical individual airport damage dis- 
tributions are better fit by mixtures of Lognormal distributions than 
by a single Lognormal distribution. We wished to see whether using such 
better approximations to the airport damage distributions would signifi- 
cantly alter the results of the sensitivity analyses. It was not prac- 
tical to carry out the process of fitting all 26 empirical distributions 
by mixtures of Lognormal distributions, so we decided to proceed differ- 
ently. We assumed, in effect, that there are only 13 airports, all of 
whose damage distributions can be fit by mixtures of two Lognormal dis- 
tributions. This yields 26 Lognormal distributions in all, which we 
took to be the ones utilized for the base case described above in Sec- 
tion 4.4.3. That base case becomes, in terms of the 13 fictitious air- 
ports, the most exact representation available, in that each of the 
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TABLE 4-7 

SENSITIVITY OF THE ANNUAL RISK PROFILE S TO CHANGES 
IN THE MEANS OF THE AIRPORT DAMAGE DISTRIBUTIONS 

KEY: All entries in the table are ratios of 

(1) the values obtained when the means 
are changed to (2) the corresponding 
values for the base case. The ratio of 
airport damage distribution mean to 
that of the base case is r and d is 
a numerical damage value. 


P {Damage > d} 

(a) 

r = 0 

Damage Value d 

•5 r = 1.5 

r = 2.0 

r = 3.0 

3 x 

h— 1 
O 

1 

.38 

1.75 

2.52 

3.87 

1 X 

10" 1 

.46 

1.54 

2.03 

2.89 

3 x 

10“ 2 

.59 

1.35 

1.76 

2.45 

1 X 

CM 

1 

O 

T— i 

.69 

1.22 

1.53 

1.89 

3 x 

10“ 3 

.79 

1.08 

1.26 

1.48 

1 X 

10“ 3 

.99 

1.06 

1.15 

1.27 



(b) 

P {Damage > d] 



Damage 

Value d 

r = 0. 

5 r = 1.5 

r = 2.0 

r = 3.0 

.03 

X 10 6 

.76 

1.10 

1.13 

1.13 

.1 

x 10 6 

.52 

1.17 

1.26 

1.38 

.3 

x 10 6 

.41 

1.53 

1.97 

2.56 

1 

x 10 6 

.34 

2.11 

3.32 

5.92 

3 

X 10 6 

.54 

1.46 

3.34 

7.80 
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TABLE 4-8 

SENSITIVITY OF THE ANNUAL RISK DISTRIBUTION TO CHANGES 
IN THE MEANS OF THE AIRPORT DAMAGE DISTRIBUTIONS 

KEY: All entries in the table are ratios of 

( 1 ) the values obtained when the means 
are changed to (2) the corresponding 
values for the base case. The ratio of 
airport damage' distribution mean to 
that of the base case is r . 


Ratio r 

Mean of the Annual 
Risk Distribution 

Standard Deviation 
of the Annual Risk 
Distribution 

.33 

.33 

.92 

• 

Ln 

o 

.50 

.93 

.67 

.67 

.95 

1.5 

1.50 

1.10 

2.0 

2.00 

1.22 

3.0 

3.00 

1.53 
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TABLE 4-9 

SENSITIVITY OF THE ANNUAL RISK PROFILE S TO 
CHANGES IN THE STANDARD DEVIATIONS OF 
THE AIRPORT DAMAGE DISTRIBUTIONS 

KEY: All entries in the table are ratios of 

(1) the values obtained when the stan- 
dard deviations are changed to (2) the 
corresponding values for the base case. 
The ratio of airport damage distribution 
standard deviation to that of the base 
case is r and d is a numerical dam- 
age value. 


P{Daraage > d} 

• 

Is ° 

v - II 

u 

Damage Value d 
5 r = 2.0 r = 5.0 

r = 10.0 

-1 

3 x 10 

1.02 

.71 .47 

• 34 

_ . .-1 

1 x 10 

.89 

.95 .78 

.62 

_ --1 

3 x 10 

.81 

1.17 1.17 

1.10 

1 _ * —2 

1 x 10 

.72 

1.31 1.52 

1.71 

3 x 10 -3 

.62 

1.58 2.05 

2.58 

1 x 10" 3 

.54 

2.00 4.06 

6.00 

Damage Value d 

(b) P {Damage > d] 
r = 0.5 r = 2.0 r = 5.0 

r = 10.0 

.03 x 10 6 

1.10 

.95 .74 

.61 

.1 x 10 6 

1.16 

. 81 . 61 

.50 

.3 x 10 6 

1.03 

.76 .59 

.47 

1 x 10 6 

.85 

1.12 1.00 

.85 

3 x 10 6 

.18 

2.00 2.47 

2.62 
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TABLE 4-10 

SENSITIVITY OF THE ANNUAL RISK DISTRIBUTION TO 
CHANGES IN THE STANDARD DEVIATIONS OF 
THE AIRPORT DAMAGE DISTRIBUTIONS 

KEY: All entries in the table are ratios of 

(1) the values obtained when the stan- 
dard deviations are changed to (2) the 
corresponding values for the base case. 
The ratio of airport damage distribution 
standard deviation to that of the base 
case is r . 


Ratio r 

Mean of the Annual 
Risk Distribution 

Standard Deviation 
of the Annual Risk 
Distribution 

0.5 

1.0 

.61 

2.0 

1.0 

1.87 

5.0 

1.0 

4.58 

10.0 

1.0 

9.14 
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TABLE 4-11 

SENSITIVITY OF THE ANNUAL RISK PROFILE S TO CHANGES 
IN THE EXPECTED NUMBER OF ACCIDENTS IN A YEAR 

KEY: All entries in the table are ratios of 

(1) the values obtained when the ex- 
pected number of accidents in a year is 
changed to (2) the corresponding values 
for the base case. The ratio of the 
expected number of accidents to that of 
the base case is r and d is a numer- 
ical damage value. 


P {Damage > d} 

(a) 

r = 0. 

Damage Value d 
50 r - 0.67 

r = 1.5 

r = 2.0 


3 x 10 

.39 

.61 

1.51 

2.12 

.-1 

1 x 10 

.53 

.67 

1.36 

1.83 

-2 

3 x 10 

.59 

.73 

1.29 

1.67 

-9 

1 x 10 

.62 

.74 

1.23 

1.52 

3 x 10" 3 

.61 

.76 

1.19 

1.48 

1 x 10" 3 

.68 

.81 

1.25 

1.47 

Damage Value d 

(b) 

r = 0. 

PiDamage > d } 
50 r = 0.67 

r - 1.5 

r = 2.0 

.03 x 10 6 

.69 

.82 

1.11 

1.20 

.1 x 10 6 

.54 

.70 

1.23 

1.38 

.3 x 10 6 

.45 

.61 

1.52 

1.97 

1 x 10 6 

.40 

.55 

1.69 

2.85 

3 x 10 6 

.28 

.49 

1.62 

2.75 
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TABLE 4-12 

SENSITIVITY OF THE ANNUAL RISK DISTRIBUTION TO CHANGES 
IN THE EXPECTED NUMBER OF ACCIDENTS IN A YEAR 

KEY: All entries in the table are ratios of 

(1) the values obtained when the ex- 
pected number of accidents in a year is 
changed to (2) the corresponding values 
for the base case. The ratio of the 
expected number of accidents to that of 
the base case is r . 


Ratio r 

Mean of the Annual 
Risk Distribution 

Standard Deviation 
of the Annual Risk 
Distribution 

0.50 

.50 

.71 

0.67 

.67 

.82 

1.5 

1.50 

1.22 

2.0 

2.00 

1.41 
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13 damage distributions is represented by a mixture of two Lognormal 
distributions. 

We then approximated each of the 13 damage distributions by a 
single Lognormal distribution. To do this we first randomly combined 
the 26 Lognormal distributions to obtain 13 pairs. For each such pair 
we found the first two moments of the mixed distribution (with mixing 
probabilities proportional to the appropriate p_^ ) , and then approxi- 
mated the mixed distribution by a single Lognormal distribution with 
the given first two moments. This yielded Lognormal approximations for 
the damage distributions at the 13 fictitious airports. 

Using the 13 Lognormal distributions we exercised the two computer 
programs discussed above for the base case and also for all of the sensi- 
tivity analysis runs previously described for the case of 26 airports. 

The results are shown in Figures 4-29 through 4-33, which corre- 
spond to Figures 4-24 through 4-28, respectively. Not only do they cor- 
respond, but in each of the five cases the two sets of curves are al- 
most carbon copies. That is, with respect to the scenario of 13 air- 
ports, approximations of the 13 airport damage distributions by (a) 
Lognormal distributions and by (b) mixtures of two Lognormal distribu- 
tions produce almost exactly the same results. This certainly lends 
credibility to the validity of the sensitivity analysis results obtained 
for the scenario of 26 airports. Those results will now be discussed. 

4.4.5 Discussion of the results 
of the sensitivity 
analyses 

This discussion is divided into three parts, corresponding to 
changes in (a) the means of the airport damage distributions, (b) the 
standard deviations of the airport damage distributions, and (c) the 
expected number of accidents in a year. It is based on Tables 4-3 
through 4-12. 
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The ratio of the airport damage 
distribution mean to that of the 
base case is r . 


Figure 4- 29. --National conditional risk profile E with changes in the means of 
the airport damage distributions (13 airports). 














[Prob {Damage > d} 


ssas 


The ratio of the airport damage 
distribution standard deviation 
to that of the base case is r 


i-j 

igure 4- 20^— National conditional risk profile R with changes in the stan dar d 
deviations of the airport damage distributions (12 airports) . 
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Tables 4-4 and 4-8 show that when the means of all the airport 
damage distributions are changed in the same proportion, then the means 
of the national conditional and annual damage distributions are changed 
in that same proportion but the standard deviations of these two distri- 
butions are changed to a much smaller degree. The standard deviation 
of the annual damage distribution is changed somewhat more than that of 
the national conditional damage distribution. With respect to the two 
risk profiles, Tables 4-3 and 4-7 can be summarized by stating that the 
damage value (for a given exceedence probability) and the exceedence 
probability (for a given damage value) are both changed in roughly the 
same proportion as the change in the means of the airport damage dis— 
istributions . 

Tables 4-6 and 4-10 show that when the standard deviations of all 
the airport damage distributions are changed in the same proportion, 
then the means of the two damage distributions are unchanged, whereas 
both their standard deviations are changed in almost (but slightly less 
than) that same proportion. With regard to the two risk profiles. 

Tables 4-5 and 4-9 indicate that, in general, the damage value (for a 
given exceedence probability) and the exceedence probability (for a 
given damage value) are both changed in somewhat less than the same 
proportion as the change in the standard deviations of the airport 
damage distributions. 

A change in y , the expected number of accidents in a year, has 
no effect, of course, on the national conditonal damage distribution, 
but Table 4-12 shows that it changes the mean of the annual damage dis- 
tribution in the same proportion and the standard deviation of this 
distribution in a smaller proportion ( the square root of the previous 
one) . Table 4—11 shows that the effect on the annual risk profile is 
roughly proportional to the change in y . 

In overall summary, then, our sensitivity analyses show that 
changes in the means or standard deviations of the airport da ma ge dis- 
tributions or in the expected number of accidents in a year produce 
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roughly proportional changes In the national conditional and annual risk 
profiles. Since changes in these risk profiles of less than a factor of 
5 are probably not deemed very significant, it seems fair to say that 
the risk profiles are not overly sensitive to the changes that were 
investigated. 

5. Simulation Model Design and Treatment 
of Sampling Error 

The third major source of error in the risk analysis modeling 
stems from sampling error in exercising the simulation. Providing the 
simulation is done properly, statistical techniques are available to 
treat this type of error. 


5.1 Simulation model design 


One current simulation model generates, by Monte Carlo simulation, 
a conditional (given an accident) risk profile for each airport. Denot- 
ing the CDF generated for airport i by F^ (x) and letting p ± 

represent the conditional probability that an accident occurs at airport 
i , given that it happens at some airport in the U.S., then the condi- 
tional (given an accident somewhere in the United States) risk profile, 

denoted by l-F^(x) , is obtained from 

F (1) (x) = PfFi^Cx) +P 2 F 2 1)(x) + ••• +p 26 F 26 )(x) ’ 


Even assuming that exact confidence bounds could be found for the 
p^(x) , and that the p^s we re exact and not estimated, it would 
still remain a most difficult (if at all possible) task to obtain exact 
confidence bounds on F^^ (x) . 


Further, to set the unconditional national annual risk profile, 
the F^(x) are convolved as follows: 
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Ffx) =* P(0) + P(l)F (1) (x) +P(2)F (2 ) (x) + ... , 

where F^(x) is the n-fold convolution of F^(x) with itself and 
P(i) is the probability of i accidents in a year. Even assuming the 
P(i) are exact and that confidence bounds could somehow de determined 

for F^ ^ (x) , the convolution procedure makes it very difficult to ob- 
tain confidence bounds on F(x) (especially if exact bounds are desired). 

In order to take advantage of the available statistical theory 
for calculating vcCtid confidence hounds, it is necessary to modify 
the Monte Carlo simulation procedure sketched above. As indicated, two 
major problems which prevent the use of valid statistical procedures 
in the above modeling design are (1) probabilistic mixing of individ- 
ual airport conditional (on one accident) risk profiles to get a national 
conditional risk profile, and (2) convolving the national conditional 
risk profile to obtain the unconditional national risk profile. The mix- 
ing and convolving of CDFs (or complementary CDFs such as the risk pro- 
files) invalidate the available statistical theory. 

To get around these problems, we suggest the modified simulation 
procedure of Section 4.4.1, which we repeat here: 

Step 1: Generate by Monte Carlo methods the number of accidents 

in a year. 

Step 2: Determine bv Monte Carlo methods at which airport each 

of the accidents generated in Step 1 occurs. 

Step 3: For each accident, determine the cost. 

Step 4: Total the cost of all the accidents for the year. 

Step 5: Repeat the above four steps n times (yielding a 

sample size of n years). 

Step 6A: Compute the empirical national annual risk profile 

directly for the sample of n years from the Step 
4 values. 
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Step 6B: Compute the empirical national conditional (given one 

accident) risk profile directly for the sample of n 
yeats/ worth of accidents from the Step 3 values. 

In this way, the statistical theory discussed in the following 
sections can be applied to both the national conditional risk profile 
and the unconditional national annual risk profile. This may result in 
larger sample sizes than presently being used, but the advantage is that 
the sample sizes required for the amount of confidence and precision 
desired can be computed in advance. The following sections illustrate 
these computational procedures. 

5.2 Pointwise confidence bounds 


The following methodology allows for confidence bound statements 
at a single point only. 

5.2.1 Binomial bounds on the risk 
for a single value 

If it is desired to obtain bounds on the risk for a particular 
value x Q , then the binomial distribution can be used to obtain well- 

accepted approximate bounds. Assuming the number of independent simu- 
lation runs, n , is large enough for the normal approximation to the 
binomial ( [nF(x Q ) ] and n[l-F(x Q )] should be > 5) , approximate 

100(l-a)% confidence bound are 

1-VV 1 -f 

Note that the band width of the bound gets smaller for large Xq , 

which is the property we desire; but this confidence statement is good 
at a single point x^ only, and not for all Xq simultaneously. 

If the normal approximation to the binomial is not adequate (which 
is probably the case for the sample sizes anticipated and the small tail 
probability of interest) then either the Poisson approximation to the 
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binomial or the exact binomial itself must be used. For example, sup- 
pose r values greater than x^ are observed in the sample of size n . 

Then, we desire to find the largest value of p for a binomial distribu- 
tion such that P(X<r | n,p) ^ a , where X is a binomial random vari- 
able with parameters n and p , and 1-ot is the confidence level de- 
sired. Denoting this value by p , a one sided 10Q(l-a)% confidence 
interval estimate of R(xq) is (0,p) . As an illustration, let us 
assume that in n simulation runs no values greater than Xq are ob- 
served. Then r=0 and 

P(X<0 | n,p) = P(X=0 | n,p) = (l-p) n = a ; 

hence, 

A , 1/n 

P = 1-a 

For values of r>0 numerical search procedures would be necessary to 

A 

find p . 


5.2,2 Nonparametric tolerance 
limits 


The prediction approach using tolerance limits as described here 

is also distribution-free. Let , .... X^ be the order 

n n n 

statistics from a sample of n observations from the distribution with 

CDF F(x) . The problem is to predict the (n+1) st' observation, X , 

rrfl 

which occurs in the future. Intervals of the form 


r(r> 


< X 


n+1 


< X 


(s) 


n 


are used [see Aitchison and Dunsmore (1975)]. There are two measures of 
precision: mean coverage and guaranteed coverage (tolerance limit) 

intervals. 
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5. 2, 2,1 Mean coverage; Considering that 

r Cs) 


<; x , . < x' 

n n+1 n 


s-r 
n+1 ’ 


is follows that "on the average" (hence the name mean coverage) the in- 
terval (X^ , will cover the next observation with a proportion 

(s-r) /(n+1) of the instances when the procedure is repeated. Note that 

P(X ,, > X (n) = 1/ (n+1) . Care must be taken in applying this procedure; 
n+1 n 


for example, one possible misinterpretation would be to look at the data, 

> $9M) = 1/301 . 

iit-l 

The problem is that 


note that X 3 qq^ = and then conclude that 


P (w I X n n)= ") * P ( X n + l ■= X n n> ) • 

The left-hand side equals = and is not distribution-^ 

free. 


5. 2. 2. 2 Guaranteed coverage (tolerance limits): In this case two 

values, a and y , are specified, where y is the probability of cov- 
erage and 1-a is the guarantee or confidence. The desired interval 
satisfies 


Thus we are 
fall in (X 


P 


F(X (s) ) - F(X <r) ) 
n n 



100 ( 1-a) % confident that 100y% 
( ‘ T \ X^) . The probability 


1-a . 


of the population will 


F(X (S) ) - F(X (r) ) = y! 
n n 11 


is distribution-free and can be expressed in terms of an incomplete beta 
function. See Aitchison and Dunsmore (1975), David (1970), and Walpole 
and Myers (1978). 
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5.2.3 An upper binomial bound 
for R(Xq) , for a 
fixed Xq 


Using a normal approximation to the binomial, the half-width of 
such a sonfidence region will be 


V 


R( x q ) [l-R(x^) ] 


a I n 

to achieve 100(l-cx)% confidence. In the tails this will be approxi- 
.mately z V R( x Q )/n . If we are looking at x^ which corresponds to a 


tail probability of 10 d and wish to be 100(l- a )% confident that our 

estimation error is also less than 10 , then we must approximately 

satisfy 


z VlO d /n = 10~ d . 
a 

For example, if we want to be 99% confident when the tail probability is 
of the order of 10 4 , then n satisfies z m VlO 4 /n = 10 4 or 

* ui 

_ 2 

2.326/ /n = 10 , which implies Jn = 232.6 ; and hence n = 54,103 s 

4 

5.4x10 . In general, to make 100(l-a)% confidence statements about 

R(Xq) when it is of the order of 10~ d , one needs n = z „10 d obser- 
vations, for fixed x Q . If it turns out that nR(x Q ) is likely to be 

less than five so that the normal approximation might not be valid — 
then the exact binomial distribution would have to be used, necessitating 
a numerical search procedure to find n . 
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5.2.4 Mean coverage prediction 
intervals 


If a sample of size n is drawn and 


Y (l) C2) y(n) 

n > n • •••* n 


are 


the order statistics, then the prediction interval (X^ r \ 


has mean coverage of the (n+l)§t observation equal to (s-r)/(n+l) . 

An interval of the form (0, X^ n ^ has mean coverage equal to n/(n+l) 

-d d 

Thus, in order to get mean coverage of 1-10 , approximately 10 

-4 

observations are required. Hence for a probability of 10 for the 
next observation exceeding the largest of the n samples, n must 
4 

be 10 . 


The interpretation of mean coverage is that if this procedure is 
used many times the resulting intervals (one for each repetition of the 
procedure) will cover the (n+l)s£ observation a proportion of times 
equal to the mean coverage or, equivalently, will not cover it a* propor- 
tion equal to one minus the mean coverage. 

5.2.5 Guaranteed coverage prediction 
(tolerance) intervals 

We are interested in distribution-free prediction intervals of 
(s) 

the form (0, X^ ') which satisfy 

p{f(x^ s) ) > y) = 1-a 

for a specified coverage value y and guarantee probability 1-a 
(this is a one-sided version of the type discussed in Section 5.2. 2.2. 

( g ) 

Here F(X^ ) has a beta distribution with parameters s-1 and n-s 
with density 
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n! 


s-1 


(s-1) J (n-s) ! 


(1-u) 


n-s 


n 

It is asymptotically normal with mean s/(n+l) and variance 
s(n-s+l)/[(n+l) (n+2) ] . If s/n - p , i.e., is the pth sample 

fractile, then F(x^ ') is asymptotically normal (p, [p(l-p)]/n) 


Consider the case of using (0, X^ n ^) as a guaranteed coverage 
prediction interval; then 

P{F(X^ n) )> Y ) = 1 - (1-Y) n = 1-a 

and 


n = log(a) 
log(Y) * 

The sample sizes required to achieve preassigned values of 1— a and y 
are given in Table 5-1. 


TABLE 5-1 


SAMPLE SIZES FOR A ONE-SIDED TOLERANCE LIMIT 


1-a 

.99 

.995 

Y 

.999 

.9995 

.9999 

.95 

299 

. 598 

2994 

5990 

29956 

.99 

458 

919 

4603 

9208 

46049 

.995 

527 

1057 

5296 

10594 

52980 

.999 

687 

1378 

6904 

13812 

69074 


Thus, using this procedure, if we wish to deal with tail probabilities 
-4 

of 10 (or coverage of ,9999) with 99% guarantee, we need 46,000 
observations. Tables similar to Table 5-1 may be found in Walpole and 
Myers (1978), page 550. 
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A misuse of the above procedure that is tempting but not correct 
is the following. Suppose a particular break-off point such as $10M 
annual cost is of interest. Take a sample of size n and note the 

( s ) 

smallest order statistic that is greater than 1QM, cabling it . 

Using this, compute P{F(X^) > y Q l for some desired y Q , and obtain 
a guarantee probability 1 — ■ Wq . Then, make the statement that (0,$10M) 
has a 100(l-cO% guarantee of coverage of 100 Yq% . The above is not 
justified; it is called data snooping. 

5.3 Simultaneous confidence bounds 

The following methodology allows for simultaneous confidence 
bound statements over the entire risk profile. 

5.3.1 Kolmogorov- Smirnov (K-S) 
type confidence bounds 

The K-S statistic gives the maximum deviation between an empirical 

and a true CDF and is denoted by D , so that 

n 

A 

D = sup | F (x) - F (x) I . 
n n 1 

x 

Durbin (1973), Hoel, Port, and Stone (1971); Dixon and Massey (1969); 
and Breiman (1973) include discussions of this statistic, both for test- 
ing Hq : F = Fq and for constructing simulataneous confidence bounds on 

the unknown F . Dixon and Massey (1969) refer to the K-S statistic as 
a d-statistic, Lilliefors (1969) treats the case of testing F = 

exponential using a modified K-S statistic where the mean of the hypoth- 
sized exponential is estimated from the sample. 

Confidence bounds using can easily be obtained by noting that 

Pr { |F n (x)-F(x) | < d a ^ 2 (n)} = 1-a , (5.1) 
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where * s tabulated and depends on the level of confidence de- 

sired (1-a) and the sample size (n) . Rewriting 5.1 we have 

P r )-d 0/2 (n) < F n M-r(x) < d a/2 (n)j = 1-a , 

or 

Pr { 1 ~^ n ( x ) -d a/2 (°) < 1_F(x) < 1- ^ n (x)+d a/2 (n) ) = 1-a ♦ 

yielding as the lower and upper confidence bound curves 1-F (x) ± 

n 

d /2 (n) . Denoting the risk profile 1-F (x) by R ( x ) , we have 

n n 

A 

confidence bounds of R (x) ± d . (n) . 

n ot/ 2 

One problem is that the upper confidence bound approaches a con- 
stant value for small values of R^x) since d^Cn) is a fixed con- 

A 

stant added to the empirical value R (x) . A plot of the K-S bound 

n 

would look like that shown in Figure 5-1. 



Damage 

Figure 5-1. — K-S bounds. 

It should be pointed out that the type of statement we can make 
from such bounds is the following: 
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"With confidence 100(l-a)% , the probability of damage exceeding 
x Q lies between [l-F n (x Q )J ± d a/2 (n) for dll values x Q ." 

This is an extremely powerful statistical statement. 

5,3,2 Upper K-S bounds for the risk profile R(x) . 

We would generally be interested in only one-sided confidence 
bounds, namely, the upper bound on the risk profile. Since sample sizes 
are large, the asymptotic theory can be used — in which case the one- 
sided critical values when using the K— S statistic are as shown in 

A 

Table 5-2. Thus, for example, P{R(x) - R n (x) < 1.52/tfiT} = .99 , and 

to deal with tail probabilities of the order of 10 ^ with 99% confidence 
requires a sample satisfying 

= io -4 or n = 2.3 X 10 8 . 


TABLE 5-2 

SAMPLE SIZES FOR UPPER K-S 
CONFIDENCE BOUNDS 

Confidence Level 
.95 
.99 
.995 
.999 
.9995 
.9999 

Some other sample sizes required to obtain various confidence levels in 
probabilities of certain orders are given in Table 5-3. 


Half-Width 

1.22 / Jn 
1.52 / Jn 
1.63 / v£ 
1.86 / Jn 
1.95 / v£ 

2.i5 / ya 
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TABLE 5-3 

SAMPLE SIZES FOR VARIOUS TAIL PROBABILITIES AND CONFIDENCES 


Confidence 

Tail Probability of Interest 
10" 2 10 -3 10 -4 io" 5 

.95 

1.5X10 4 

1.5X10 6 

1.5X10 8 

1.5X10 10 

.99 

2.3X10 4 

2.3X10 6 

2.3X10 8 

2.3X10 10 

.995 

2.7X10 4 

2.7X10 6 

2.7X10 8 

2. 7X10 10 

.999 

3.5X10 4 

3.5X10 6 

3.5X10 8 

3. 5x10 10 

.9995 

j 

3.8X10 4 

3.8X10 6 

3.8X10 8 

3.8X10 10 

.9999 

4.6X10 4 

4.6xl0 6 

4.6X10 8 

4.6xl0 10 


Roughly speaking, to be able to make confidence statements about prob- 
abilities of the order of 10 8 , the width of the band must also be of 

that order, 10 8 . To achieve a one-sided K-S bound of half-width 

” d 2d 

10 requires approximately 10 observations. 

5.3.3 Modified Kolmogorov- 
Smirnov confidence 
regions 

One problem with K-S bounds is that for very small values of 
risk, the term d^(n) overwhelms R^(x) • It seems desirable to ob- 
tain confidence bounds such that d^(n) decreases with increasing x 
so that one obtains a picture such as that shown in Figure 5-2. There 
are some generalizations of the K-S statistic which might help in this 
egard. The Anderson- Darling (A-D) statistic can have the property of 
the bounds coming "in" at large (and small) values of x but these 
bounds are only approximate. Discussion of the A-D statistic may be 
found in Anderson and Darling (1952), and Durbin (1973). Another 
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generalization of K-S, called the generalized D statistic, has the 
bounds partially coming in at the ends — a sort of situation between 
K-S and the approximate A-D [see, for example, Dempster (1959) and 
Dwass (1959)]. 



Figure 5-2. — Desired bounds. 

In this section we investigate a method of constructing a simul- 
taneous confidence region which is narrower in the right tail. This 
enables one to be more confident in the right — tail probabilities at the 
expense of less confidence in the central and left-tail probabilities. 

Let R( # ) be the cumulative risk function: R(x) = P{risk exceeds 

x} . Let F(x) = 1 - R(x) be the CDF of the risk; R is estimated by 

A 

the empirical risk function R^ , based on a simulated sample of n 
accidents (or n years). 

It is desired to estimate R with a confidence region based upon 

a ^ 

, or equivalently F with a confidence region based on F^ . Be- 
cause we are primarily interested in the right tail, we would like a 
region which is narrower in the tail. A very desirable region would be 
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R < R + gR , (5.2) 

n n 

where $> 0 . If g = 9 , we are one order of magnitude wide. Unfor- 
tunately it is impossible to get a region of the form (5.2). Suppose 

is the largest of the n simulated values, then & (x) = 0 for 
n n 

x > ; but P{R(X^ = 0} < 1 and if R is continuous P{R(X^ = 

n n n 

0} = 0 . Thus P{R(x) < R^(x) + 3R n (x) for all x} = 0 . 

However, it is possible to construct a confidence region of the 

form 

/\ /v 

R < R + gR + a , (5.3) 

n n 

-2 -3 -4 

where typically we might choose =10 ,10 , or 10 and 1+3 = 

/10, 10, v^LOO; i.e., %, 1, or lh orders of magnitude. The equivalent 
region for F = 1 - R would be 

{l-F < «-F n ) (1+6) + a} - {f, < F + . 

Thus 

p|r(x) < R^(x) + 6R n (x) + a, for all x| = p|f^(x) < F(x) + * 

(5.4) 

By the standard distribution-free argument, (5.4) is independent of F , 
so for computation we can let F(x) = x , the uniform [0,1] distribu- 
tion. Let U (*) be the empirical distribution function of uniform 
n 

order statistics, then 

p{ E < R, + 8R„ + a} - P{fi n (*) ^ for all,} 

= p|u n (x) <bx + a, for all x| 

= p n (a,b) , (5.5) 

where b = l/(l+g) and a = (a+g)/(l+g) . The probability (5.5) can 
be calculated. Explicit formulae for it are given in Durbin (1973). 
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It is possible to find the sample size n which achieves 

, _ /ct+6 _ 1 _\ 

1_Y " ^n\l+3 ’ m ) 

for various combinations of a, 3> Y> e.g., y=.01, .05 ; 1+3 = 

0 *3 / 

10, Mo ; a=10 , 10 , 10 . This enables one to take a large 

enough simulated sample to achieve any desired precision in a simulta- 
neous confidence region. 

To illustrate the usefulness of the above modified K-S region, 
we consider an example. Figure 5-3 shows a typical simulated risk pro- 
file. If this is based on a sample of 500 observations, then the usual 
K-S 95% confidence region is 

R < R + l - 22 ■ = R + .05456 . 

7500 

A 

This region is shown in Figure 5-3. The modified region R < 2R + d 
can be computed using a formula for p^(a,b) from Durbin (1973); 


P n (a,b) 



Numerical methods can be used to solve the equation 

P 500 ^ ~ * ^ • 

The solution is a = .50375 . This gives a modified K-S 95% confidence 
region 

R < 2R + .0075 . 


This region is also included in Figure 5-3. Note that it gives much 
greater precision in the right tail. This is achieved at the sacrifice 
of precision in the right tail. 


It should be noted that the tail of an empirical CDF behaves like 
a Poisson process and consequently the properties of these modified K-S 
regions can be approximated by properties of Poisson processes; see 
Dwass (1974) and Pyke (1959). 
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Figure 5-3, — Simulated risk profile and confidence regions: risk profile 

R n ( ) based on 500 simulated observations; Kolmogorov- 

Smimov 95% upper bound R^ + .056 (s e ■); modified 

Kolmogorov-Smimov 95%> upper bound 2R + .0075 f-x-— * ). 
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5.4 C omparison of procedures 

The K-S procedure gives the greatest flexibility because it pro- 
vides simultaneous bounds, but this is paid for with a huge sample size 
requirement. The modified K-S procedure reduces this price somewhat. 

If a single cut-off value Xq can be fixed, for which a confi- 
dence region on R(Xq) is useful, this can be obtained using the 

binomial distribution, with considerably fewer observations than the 
K-S. However, it can be done only for a single point. 

If a desired mean coverage probability or coverage-guarantee 
pair can be predetermined, then the prediction interval approach may be 
used. However, data snooping is not allowed. 

A comparison of sample sizes required to achieve a precision of 

-4 

IQ with 99% confidence is given in Table 5-4. (Note that Mean Cover- 
age statements do not involve confidence.) 

6. Conservative Risk Profiles 


Engineering design has long used "safety factors" to ensure the 
adequacy and the safety of physical and electronic systems. This may 
be looked at as a heuristic, but operationally viable, way to deal with 
the uncertainty of the environment in which a given system will operate. 
Not having sufficient knowledge about the environment to design for it 
precisely, one takes a "conservative" approach in using safety factors 
to design for extreme conditions that might arise. One can then say 
that if there is an "error" in the design, then it is surely an "error 
on the conservative side." 

It seems desirable to follow a similar principle of conservatism 
in generating and reporting results of the GFRAP. Assuming that the 
GFRAP will ultimately report the risks associated with the use of CF in 
commerical aircraft to be "small," the results of the risk analysis will 
be more readily accepted if it can be stated that they overestimate the 
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TABLE 5-4 

REQUIRED SAMPLE SIZES FOR 99% CONFIDENCE 
STATEMENTS WITH 10 -4 PRECISION 


♦ Kolmogorov- Smirnov 



— 4 

R < R +10 
= n 

y 

n = 2.3 x 10 8 

A -2 
R < R + 10 
— n 

y 

n = 2.3 x 10 4 

• Modified Kolmogorov- Smirnov 

(Poisson approximation) 

R < 2R + 10 _4 
= n 

y 

n = 4.6 x 10 4 

/x —A 

• Binomial (R^x^) =10 ) 



R(x Q ) < R n (x Q ) + 10" 4 

y 

n = 5 .4 x 10 4 

•Mean Coverage 



p{next observation > 

n 

} = io “ 4 , 

4 

n = 10 

•Guaranteed Coverage 



P{ Damage > xj^} < 10 4 

y 

4 

n = 4.6 x 10 


true risks. It should be stated that it is not really possible to pre- 
sent the "true risks" because of modeling limitations, data inadequacies, 
and limited financial resources. 

The results of the risk analysis are summarized in two risk pro- 
files, the national annual risk profile and the national conditional 
(given one accident) risk profile. We therefore use the following 
concept of conservatism for risk profiles. If two risk profiles do 
not cross, it is justifiable to say that the one above and to the 
right of the other is more conservative (or pessimistic, or represents 
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a greater amount of risk). Figure 4-25 (in Section 4.4.3) illustrates 
such a case. The risk profile for r=2 is above and to the right of 
that for r = 1 • That the former is more pessimistic folllows from the 
following two observations: (1) For every damage value d , the pro 

file for r = 2 shows a higher probability of exceeding d than does 
the profile for r = 1 . (2) For every probability p , the profile for 
r = 2 shows a higher value d such that Prob (damage > d} = p than does 
the profile for r=l . 

Thus if we present GFRAP results in the form of risk profiles 
which are known to be above and to the right of the "true" unknown risk 
profiles, we present results which are conservative. This sectxon is 
concerned with methods by which such conservative risk profiles may be 
obtained. Section 6.1 formalizes the above concept of two risk profiles 
which do not cross through the concept of stochastic dominance. Section 
6.2 shows how this concept can be used operationally in the GFRAP in 
order to obtain conservative results, and Section 6.3 summarizes and 
interprets the preceding developments. Several theorems are stated in 
Section 6.2 without proof in order to conserve space. 

6.1 Stochastic dominance 

Stochastic dominance is a concept which has become very important 
in the area of decision making under uncertainty, but it also has useful 
application here to risk profiles. The following definitxons and proper- 
ties may be found in Whitemore and Findlay (1978) unless noted otherwise. 
We adopt the notation that F = 1-F for any CDF F . 

The basic definition of first degree stochastic dominance is the 

following: Let F and G be CDF’s. Now F G (F "stochastically 

dominates" G) if and only if F(x) > G(x) for all real values x . 

This says that F G if and only if the risk profile corresponding to 

the CDF F lies above (or at least not below) and to the right (or at 
least not to the left) of that corresponding to G ; i.e., the risk 
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profile corresponding to F is conservative relative to that correspond- 
ing to G . 

Another notation for indicating first degree stochastic dominance 
is in terras of random variables having the indicated CDF’s. Let X and 
Y be random variables with respective CDF’s F and G . One writes 

X Y if an d only if F S > G . 

A useful result is the folllowing: Let be the set of nonde- 

s t 

creasing functions on the real line. Then F > G if and only if 
/udF _> /udG for all functions u in for which these integrals 

exist. 

Suppose G represents one of the two ’’true” risk profiles that 
are desired among the final outputs of the GFRAP risk analysis. Not 
being able to determine G , as indicated above, we want to report a 

risk profile F such that F S > G . The general idea is to replace all 
estimated and projected quantities and probability distributions which 
enter the simulation model by ’’conservative" ones. The next section 
provides details. 

6 . 2 Obtaining conservative risk 
profiles 

We deal first with the national annual risk profile and relate it 
to the national conditional (given one accident) risk profile. Denote 
these by H and *F , respectively, and define the following random 
variables : 

= the damage done by the 1th accident that occurs 
during the year, i=l,2,... ; 

S = the total damage done by all accidents that occur 
during the year; 

Z = the number of accidents that occur during the 
year. 
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Then S has CDF H , each X^ has CDF F , the are mutually inde- 

pendent, and 


S = 



9 


where Xq is a random variable which is identically zero. 


Let the CDF of the discrete random variable Z be G , and con- 
sider what happens if G is replaced by G* , where G* is the CDF of 
another discrete nonnegative random variable, say Z* , such that 

G* S > G ; i.e. , Z* > Z . Let 


Z* 

S* = l 
i=0 


and let H* be the CDF of S* . Theorem 6.1 then states that H* H 
so that a conservative risk profile is obtained by replacing the CDF of 
Z by a CDF which is stochastically greater* 


Theorem 6.1: If G* 2 ? G , then H* S j> H , or equivalently, S* _> S • 

We have assumed that Z has a Poisson distribution with mean y . 
The next theorem states that increasing the value of y yields a CDF of 

G* for Z* such that G* G . 

Theorem 6.2: Suppose Z and Z* both have Poisson distributions with 

respective means y and y* . Then Z* Z , i.e., G* S > G , if and 
only if y* y . 

Theorems 6.1 and 6.2 together indicate that the use of a mean 
number of accidents per year greater than the true mean will yield a 
conservative risk profile. 
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Now suppose that the CDF F corresponding to the national condi- 
tional risk profile is replaced by a CDF F^ such that F^ ^ F . 

—if 

Clearly this will lead to a conservative risk profile, say H . For- 
mally, let (i=l,2,...) be mutually independent random variables 

with CDF F^ . Now define 


S 


// 



9 


and let be the CDF of S 


# 


Theorem 6,3: If F 


# 


st 

> 


then 




H 


or equivalently, S _>^ S 


For the GFRAP, the national conditional risk profile is the 

weighted average of the single-accident risk profiles at the N differ- 

a 

ent airports. Using the notation of Section 4.4.1, we have 


F(x) 


N 

a 


i=l 


p i r di (x) 


9 


or equivalently, F = Y.p.F . If some of the F , . are replaced by 

i l ax di 

CDF’s which dominate them it is clear that a conservative national condi- 
tional risk profile is obtained. Formally, we have Theorem 6.4. 


Theorem 6.4: If F^ ^ for i=l,...,N a , and ^ , 

then F^ ^ F . 

In Theorem 6.4 the p_^ values are held fixed. If they are al- 
lowed to vary, some must decrease while others increase since their sum 

remains equal to one. It is intuitive that if the p.’s which increase 

i 

correspond to CDF’s which dominate the CDF’s corresponding to the p.’s 
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which decrease, then a conservative national conditional risk profile is 
obtained. This is now formalized. Let N = {l,2 N a > and let I ,1 

be disjoint subsets of N . Define a new set of probability values p^ , 
i e N , such that 

P ± , if i e I + , 

< P ± . ^ i £l " • 

p^ = p^ , otherwise. 

Now let = \ p ± F^ d ; we obtain: 

Theorem 6.5: If F id ^ F^ d for every pair (i,j) such that iel 

// s t 

and j e I , then F *> F . 

Now we step back and concentrate on the CDF F^ of the damge 

per accident at airport i • We will show how to obtain a conservative 
estimate of its corresponding risk profile . We first drop the 

subscript i , for convenience, and to indicate that the analysis applies 
at each airport. 

Consider the random variable X , defined to be the damage done by 
an arbitrary accident at a particular airport; it has CDF F d . This X 

is the sum of the costs due to the failure of various electronic and elec- 
trical components and systems. We number all the components and systems 
which could be affected by an accident from 1 to M (a large finite num- 
ber) , and define the random variables 

Yj - 1 if component or system j fails, 

= 0 otherwise 

C. - the cost incurred if Y. = 1 . 

J J 
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the 


We assume the to be mutually independent and independent of 

Y . Now we write the damage X as 

j 


X 


M 

- I 

j=l 




From this expression it is clear that replacing the CDF of any cost 
by a CDF that dominates it will lead to a risk profile that is conserva- 
tive relative to F^ . Formally, let X* = * Then we have: 

Theorem 6.6: C* _>^ Cj for all j = l,«.*,M implies X* X . 


We clearly have an analogous result if the Y^ are replaced by Y* 

such that Y* >, Y. . Note that this means we replace the failure prob- 
3 =1 1 

ability of component j by a value at least as large. Now let X* = 

y.c.Y* . 

L J J 3 


Theorem 6. 


7: Y* Y for all j=l,...,M implies X* ^ X . 


Now we will concentrate on conditions which imply Y* Y^ 

for all j . According to the exponential failure model the probability 

distribution of Y. is specified by Pr(Y, = 1) = 1 - exp(-W ) , 
j J J 

where the random variable is defined by 

W. » T.E./E. , 

J 3 3 J 

and Ej is the mean exposure to failure of component j , E^ is the 

outside exposure applicable to component j , and is the overall 

transfer coefficient (or transfer function) applicable to component j 
in its individual environment. Here T\ , E_, , and E_. are all treated 

as (independent) random variables; E_. is indeed a random variable, and 
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T and E are treated as such because of a lack of sufficiently pre— 
3 3 

cise information to specify their values exactly. Note that use of the 
exponential failure model is itself a conservative assumption for the 
GFRAP (refer to Section 4.2.3). 

The next theorem follows from the result given in Section 6.1. 


Theorem 6.8: Specify Y* by Pr(Y* = 1) = 1 - exp(-W*) . Then 

17* _>^ Wj implies Y* Y^ . 


Now we concentrate on W 


j 


Let W* = T | E j/ E j • since T j » T j » 


E , E* , E and E* are all positive random variables one obtains: 
3 j 3 ’ 3 


Theorem 6. 9: If T* _>^ , E* E^ and E_. E* , then W* 

and so Y* >. Y. . 

3 =1 3 


The last thing we wish to do here is relate the exposure E^ to 

certain characteristics of the accident that produces this exposure. It 
is clearly reasonable to assume that, with other factors (such as wind 
direction and velocity, aircraft type, duration of the burn, etc.) held 
constant. 


E j * KP r F i ( *cf 5 

that is, E. is directly proportional to (K is the proportionality 
constant) the product of Q c f » the Quantity of CF composite on the 
aircraft, F ± , the fraction of the CF composite carried that is involved 
in the fire, and F r , the fraction of the CF burned that is released. 
Treating , F_^ , and F^ as random variables, and defining E* as 

the exposure obtained in place of E^ when , F^ , and F^ are 

replaced by Q*^ , F* , and F* , respectively, we obtain the following: 
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Theorem 6, 10: Q* f > ± Q cf , FJ > x > and F * F r implies E j i E j 
for all j=l, . . . ,M . 

6,3 Summary 

We have tried in this section to give some precision to the notion 
of a "conservative risk analysis" through the concepts of stochastic 
dominance and conservative risk profiles. In essence, we have shown that 
replacing the probability distributions used in the GFRAP risk simulation 
model by probability distributions which stochastically dominate them 
will lead to a conservative risk analysis. Given that modeling limita- 
tions, data inadequacies, etc. prevent us from determining the true risk 
profiles," we think it appropriate to report risk profiles that can be 
stated to be conservative relative to the "true" ones. 

In more concrete terms, for example, if the mean number of acci- 
dents per year y is projected to be, say, 2.6, but it is recognized 
that this projection may be in error, then an appronriatelv determined 
higher value, e.g., 3.0, should be used Instead in order to produce a 
conservative national annual risk profile. It would be appropriate to 
state that it is confidently believed that the true value of ]i does not 
exceed 3.0. 

The situation is somewhat different when it is desired to replace 

a random variable by a specific numerical value. For example, consider 

the case of F , the fraction of the CF burned that is released. It is 
r 

known that F varies from accident to accident and should be treated 
r 

as a random variable. But suppose that the experimental evidence pres- 
ently available indicates that values of F r above 0.01 are extremely 

improbable. It is then acceptable, and conservative, to use F^ = 

0.01 , stating that it is sufficiently confidently believed that F^ 

will not exceed 0.01 that such possibility has been deleted from the 
modeling effort. 
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